diff --git a/miniVite b/miniVite
deleted file mode 160000
index f4367f1..0000000
--- a/miniVite
+++ /dev/null
@@ -1 +0,0 @@
-Subproject commit f4367f1c5d64034a3ee09c4de5397767c25eafb2
diff --git a/miniVite/FAQS b/miniVite/FAQS
new file mode 100644
index 0000000..11b66fe
--- /dev/null
+++ b/miniVite/FAQS
@@ -0,0 +1,270 @@
+*****************
+* miniVite FAQs *
+*****************
+----------------------------------------------------
+FYI, typical "How to run" queries are addressed Q5
+onward.
+
+Please send your suggestions for improving this FAQ
+to zsayanz at gmail dot com OR hala at pnnl dot gov.
+----------------------------------------------------
+
+-------------------------------------------------------------------------
+Q1. What is graph community detection?
+-------------------------------------------------------------------------
+
+A1. In most real-world graphs/networks, the nodes/vertices tend to be 
+organized into tightly-knit modules known as communities or clusters,
+such that nodes within a community are more likely to be "related" to 
+one another than they are to the rest of the network. The goodness of 
+partitioning into communities is typically measured using a metric 
+called modularity. Community detection is the method of identifying 
+these clusters or communities in graphs.
+
+[References]
+
+Fortunato, Santo. "Community detection in graphs." Physics reports 
+486.3-5 (2010): 75-174. https://arxiv.org/pdf/0906.0612.pdf
+
+--------------------------------------------------------------------------
+Q2. What is miniVite? 
+--------------------------------------------------------------------------
+
+A2. miniVite is a distributed-memory code (or mini application) that 
+performs partial graph community detection using the Louvain method. 
+Louvain method is a multi-phase, iterative heuristic that performs 
+modularity optimization for graph community detection. miniVite only 
+performs the first phase of Louvain method.
+
+[Code]
+
+https://github.com/Exa-Graph/miniVite
+http://hpc.pnl.gov/people/hala/grappolo.html
+
+[References]
+
+Blondel, Vincent D., et al. "Fast unfolding of communities in large 
+networks." Journal of statistical mechanics: theory and experiment 
+2008.10 (2008): P10008.
+
+Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. 
+miniVite: A Graph Analytics Benchmarking Tool for Massively Parallel 
+Systems.
+
+---------------------------------------------------------------------------
+Q3. What is the parent application of miniVite? How are they different?
+---------------------------------------------------------------------------
+
+A3. miniVite is derived from Vite, which implements the multi-phase 
+Louvain method. Apart from a parallel baseline version, Vite provides 
+a number of heuristics (such as early termination, threshold cycling and 
+incomplete coloring) that can improve the scalability and quality of 
+community detection. In contrast, miniVite just provides a parallel 
+baseline version, and, has option to select different MPI communication 
+methods (such as send/recv, collectives and RMA) for one of the most 
+communication intensive portions of the code. miniVite also includes an 
+in-memory random geometric graph generator, making it convenient for 
+users to run miniVite without any external files. Vite can also convert 
+graphs from different native formats (like matrix market, SNAP, edge 
+list, DIMACS, etc) to the binary format that both Vite and miniVite 
+requires.
+
+[Code]
+
+http://hpc.pnl.gov/people/hala/grappolo.html
+
+[References]
+
+Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Lu H, Chavarria-Miranda D, 
+Khan A, Gebremedhin A. Distributed louvain algorithm for graph community 
+detection. In 2018 IEEE International Parallel and Distributed Processing 
+Symposium (IPDPS) 2018 May 21 (pp. 885-895). IEEE.
+
+Ghosh S, Halappanavar M, Tumeo A, Kalyanaraman A, Gebremedhin AH. 
+Scalable Distributed Memory Community Detection Using Vite. 
+In 2018 IEEE High Performance extreme Computing Conference (HPEC) 2018 
+Sep 25 (pp. 1-7). IEEE.
+
+-----------------------------------------------------------------------------
+Q4. Is there a shared-memory equivalent of Vite/miniVite?
+-----------------------------------------------------------------------------
+
+A4. Yes, Grappolo performs shared-memory community detection using Louvain 
+method. Apart from community detection, Grappolo has routines for matrix 
+reordering as well. 
+
+[Code]
+
+http://hpc.pnl.gov/people/hala/grappolo.html
+
+[References]
+
+Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable 
+community detection. Parallel Computing. 2015 Aug 1;47:19-37.
+
+Halappanavar M, Lu H, Kalyanaraman A, Tumeo A. Scalable static and dynamic 
+community detection using grappolo. In High Performance Extreme Computing 
+Conference (HPEC), 2017 IEEE 2017 Sep 12 (pp. 1-6). IEEE.
+
+------------------------------------------------------------------------------
+Q5. How does one perform strong scaling analysis using miniVite? How to 
+determine 'good' candidates (input graphs) that can be used for strong 
+scaling runs? How much time is approximately spent in performing I/O?
+------------------------------------------------------------------------------
+
+A5. Use a large graph as an input, preferably over a billion edges. Not all 
+large graphs have a good community structure. You should be able to identify 
+one that serves your purpose, hopefully after few trials. Graphs can be 
+obtained various websites serving as repositories, such as Sparse TAMU 
+collection[1], SNAP repository[2] and MIT Graph Challenge website[3], to name 
+a few of the prominent ones. You can convert graphs from their native format to 
+the binary format that miniVite requires, using the converters in Vite (please 
+see README). If your graph is in Webgraph[4] format, you can easily convert it 
+to an edge list first (example code snippet below), before passing it on to Vite 
+for subsequent binary conversion.
+
+#include "offline_edge_iterator.hpp"
+...
+using namespace webgraph::ascii_graph;
+
+// read in input/output file 
+std::ofstream ofile(argv[2]);
+offline_edge_iterator itor(argv[1]), end;
+
+// read edges
+while( itor != end ) {
+    ofile << itor->first << " " << itor->second << std::endl;
+    ++itor;
+}
+ofile.close();
+...
+
+Due to its simple vertex-based distribution, miniVite takes about 2-4s to read a 55GB 
+binary file if you use Burst buffer (Cray DataWarp) or Lustre striping (about 25 OSTs, 
+default 1M blocks). Hence, the overall I/O time that we have observed in most cases is 
+within 1/2% of the overall execution time.
+
+[1] https://sparse.tamu.edu/
+[2] http://snap.stanford.edu/data
+[3] http://graphchallenge.mit.edu/data-sets
+[4] http://webgraph.di.unimi.it/
+
+-----------------------------------------------------------------------------------
+Q6. How does one perform weak scaling analysis using miniVite? How does one scale
+the graphs with processes?
+-----------------------------------------------------------------------------------
+
+A6. miniVite has an in-memory random geometric graph generator (please see
+README) that can be used for weak-scaling analysis. An n-D random geometric graph 
+(RGG), is generated by randomly placing N vertices in an n-D space and connecting 
+pairs of vertices whose Euclidean distance is less than or equal to d. We only 
+consider 2D RGGs contained within a unit square, [0,1]^2. We distribute the domain 
+such that each process receives N/p vertices (where p is the total 
+number of processes). 
+
+Each process owns (1 * 1/p) portion of the unit square and d is computed as (please 
+refer to Section 4 of miniVite paper for details): 
+
+d = (dc + dt)/2;
+where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N)
+
+Therefore, the number of vertices (N) passed during miniVite execution on p
+processes must satisfy the condition -- 1/p > d.
+
+Please note, the default distribution of graph generated from the in-built random 
+geometric graph generator causes a process to only communicate with its two 
+immediate neighbors. If you want to increase the communication intensity for 
+generated graphs, please use the "-p" option to specify an extra percentage of edges 
+that will be generated, linking random vertices. As a side-effect, this option 
+significantly increases the time required to generate the graph.
+
+------------------------------------------------------------------------------
+Q7. Does Vite (the parent application to miniVite) have an in-built graph 
+generator?
+------------------------------------------------------------------------------
+
+A7. At present, Vite does not have an in-built graph generator that we have in 
+miniVite, so we rely on users providing external graphs for Vite (strong/weak 
+scaling) analysis. However, Vite has bindings to NetworKit[5], and users can use 
+those bindings to generate graphs of their choice from Vite (refer to the 
+README). Generating large graphs in this manner can take a lot of time, since 
+there are intermediate copies and the graph generators themselves may be serial 
+or may use threads on a shared-memory system. We do not plan on supporting the 
+NetworKit bindings in future.
+
+[5] https://networkit.github.io/
+
+------------------------------------------------------------------------------
+Q8. Does providing a larger input graph translate to comparatively larger 
+execution times? Is it possible to control the execution time for a particular
+graph?
+------------------------------------------------------------------------------
+
+A8. No. A relatively small graph can run for many iterations, as compared to
+a larger graph that runs for a few iterations to convergence. Since miniVite is 
+iterative, the final number of iterations to convergence (and hence, execution 
+time) depends on the structure of the graph. It is however possible to exit 
+early by passing a larger threshold (using the "-t <...>" option, the default 
+threshold or tolerance is 1.0E-06, a larger threshold can be passed, for e.g, 
+"-t 1.0E-03"), that should reduce the overall execution time for all graphs in 
+general (at least w.r.t miniVite, which only executes the first phase of Louvain 
+method).
+
+------------------------------------------------------------------------------
+Q9. Is there an option to add some noise in the generated random geometric 
+graphs?
+------------------------------------------------------------------------------
+
+A9. Yes, the "-p <percent>" option allows extra edges to be added between 
+random vertices (see README). This increases the overall communication, but 
+affects the structure of communities in the generated graph (lowers the 
+modularity). Therefore, adding extra edges in the generated graph will 
+most probably reduce the global modularity, and the number of iterations to 
+convergence shall decrease. 
+The maximum number of edges that can be added is bounded by INT_MAX, at 
+present, we do not handle data ranges more than INT_MAX.
+
+------------------------------------------------------------------------------
+Q10. What are the steps required for using real-world graphs as an input to
+miniVite?
+------------------------------------------------------------------------------
+
+A10. First, please download Vite (parent application of miniVite) from: 
+http://hpc.pnl.gov/people/hala/grappolo.html
+
+Graphs/Sparse matrices come in several native formats (matrix market, SNAP, 
+DIMACS, etc.) Vite has several options to convert graphs from native to the 
+binary format that miniVite requires (please take a look at Vite README).
+
+As an example, you can download the Friendster file from: 
+https://sparse.tamu.edu/SNAP/com-Friendster
+The option to convert Friendster to binary using Vite's converter is as follows 
+(please note, this part is serial): 
+
+$VITE_BIN_PATH/bin/./fileConvertDist -f $INPUT_PATH/com-Friendster.mtx 
+    -m -o $OUTPUT_PATH/com-Friendster.bin
+
+After the conversion, you can run miniVite with the binary file obtained
+from the previous step:
+
+mpiexec -n <...> $MINIVITE_PATH/./dspl -r <processes-per-node> 
+    -f $FILE_PATH/com-Friendster.bin
+
+--------------------------------------------------------------------------------
+Q11. miniVite is scalable for a particular input graph, but not for another 
+similar sized graph, why is that?
+--------------------------------------------------------------------------------
+
+A11. Presently, our distribution is vertex-based. That means a process owns N/p 
+vertices and all the edges connected to those N/p vertices (including ghost 
+vertices). Load imbalances are very probable in this type of distribution, 
+depending on the graph structure. 
+
+As an example, lets say there is a large (real-world) graph, and its structure 
+is such that only a few processes end up owning a majority of edges, as per 
+miniVite graph data distribution. Also, lets assume that the graph has either a 
+very poor community structure (modularity closer to 0) or very stable community 
+structure (modularity close to 1 after a few iterations, that means not many 
+vertices are migrating to neighboring communities). In both these cases, 
+community detection in miniVite will run for relatively less number of 
+iterations, which may affect the overall scalability.
diff --git a/miniVite/LICENSE b/miniVite/LICENSE
new file mode 100644
index 0000000..4959d64
--- /dev/null
+++ b/miniVite/LICENSE
@@ -0,0 +1,29 @@
+BSD 3-Clause License
+
+Copyright (c) 2018, Battelle Memorial Institute
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+* Redistributions of source code must retain the above copyright notice, this
+  list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright notice,
+  this list of conditions and the following disclaimer in the documentation
+  and/or other materials provided with the distribution.
+
+* Neither the name of the copyright holder nor the names of its
+  contributors may be used to endorse or promote products derived from
+  this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/miniVite/Makefile b/miniVite/Makefile
new file mode 100644
index 0000000..a5e51f4
--- /dev/null
+++ b/miniVite/Makefile
@@ -0,0 +1,33 @@
+CXX = mpicxx
+# use -xmic-avx512 instead of -xHost for Intel Xeon Phi platforms
+PLUGIN_FLAG = -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so
+#OPTFLAGS = -O3 -xHost -qopenmp -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+#OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+#OPTFLAGS = -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+#OPTFLAGS = -O3 -fopenmp -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+#OPTFLAGS = -O3 -fopenmp -DCHECK_NUM_EDGES -DDEBUG_PRINTF #-DPRINT_EXTRA_NEDGES #-DPRINT_DIST_STATS #-DUSE_MPI_RMA -DUSE_MPI_ACCUMULATE #-DUSE_32_BIT_GRAPH #-DDEBUG_PRINTF #-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS#-DUSE_MPI_RMA #-DPRINT_LCG_DOUBLE_RANDOM_NUMBERS #-DPRINT_RANDOM_XY_COORD
+#-DUSE_MPI_SENDRECV
+#-DUSE_MPI_COLLECTIVES
+# use export ASAN_OPTIONS=verbosity=1 to check ASAN output
+SNTFLAGS = -std=c++11 -fopenmp -fsanitize=address -O1 -fno-omit-frame-pointer
+CXXFLAGS = -std=c++11 -g $(OPTFLAGS)
+
+OBJ = main.o
+TARGET = miniVite
+
+all: $(TARGET)
+
+%.o: %.cpp
+	$(CXX) $(CXXFLAGS) $(PLUGIN_FLAG) -c -o $@ $^
+
+%.ll: %.cpp
+	$(CXX) $(CXXFLAGS) $(PLUGIN_FLAG) -emit-llvm -S -c -o $@ $^
+
+$(TARGET):  $(OBJ)
+	$(CXX) $^ $(OPTFLAGS) -o $@
+
+.PHONY: clean
+
+clean:
+	rm -rf *~ $(OBJ) $(TARGET) *.ll
diff --git a/miniVite/README b/miniVite/README
new file mode 100644
index 0000000..f320dec
--- /dev/null
+++ b/miniVite/README
@@ -0,0 +1,138 @@
+************************
+miniVite (/mini/ˈviːte/)
+
+Version: 1.0
+************************
+
+*******
+-------
+ ABOUT
+-------
+*******
+miniVite is a proxy app that implements a single phase of Louvain 
+method in distributed memory for graph community detection. Please 
+refer to the following paper for a detailed discussion on
+distributed memory Louvain method implementation:
+https://ieeexplore.ieee.org/abstract/document/8425242/
+
+Apart from real world graphs, users can use specific options 
+to generate a Random Geometric Graph (RGG) in parallel.
+RGGs have been known to have good community structure:
+https://arxiv.org/pdf/1604.03993.pdf
+
+The way we have implemented a parallel RGG generator, vertices 
+owned by a process will only have cross edges with its logical
+neighboring processes (each process owning 1x1/p chunk of the
+1x1 unit square). If MPI process mapping is such that consecutive 
+processes (for e.g., p and p+1) are physically close to each other, 
+then there is not much communication stress in the application. 
+Therefore, we allow an option to add extra edges between randomly 
+chosen vertices, whose owners may be physically far apart. 
+
+We require the total number of processes to be a power of 2 and 
+total number of vertices to be perfectly divisible by the number of 
+processes when parallel RGG generation options are used. 
+This constraint does not apply to real world graphs passed to miniVite.
+
+We also allow users to pass any real world graph as input. However,
+we expect an input graph to be in a certain binary format, which
+we have observed to be more efficient than reading ASCII format
+files. The code for binary conversion (from a variety of common
+graph formats) is packaged separately with Vite, which is our
+full implementation of Louvain method in distributed memory.
+Please follow instructions in Vite README for binary file 
+conversion.
+
+Vite could be downloaded from:
+http://hpc.pnl.gov/people/hala/grappolo.html
+
+Unlike Vite, we do not implement any heuristics to improve the
+performance of Louvain method. miniVite is a baseline parallel
+version, implementing only the first phase of Louvain method.
+
+This code requires an MPI library (preferably MPI-3 compatible) 
+and C++11 compliant compiler for building. 
+
+Please contact the following for any queries or support:
+
+Sayan Ghosh, WSU (zsayanz at gmail dot com)
+Mahantesh Halappanavar, PNNL (hala at pnnl dot gov)
+
+*************
+-------------
+ COMPILATION
+-------------
+*************
+Please update the Makefile with compiler flags and use a C++11 compliant 
+compiler of your choice. Invoke `make clean; make` after setting paths 
+to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun`
+to execute the code with specific runtime arguments mentioned in the
+next section.
+
+Pass -DPRINT_DIST_STATS for printing distributed graph 
+characteristics.
+
+Pass -DDEBUG_PRINTF if detailed diagonostics is required along
+program run. This program requires OpenMP and C++11 support,
+so pass -fopenmp (for g++)/-qopenmp (for icpc) and -std=c++11/
+-std=c++0x.
+
+Pass -DUSE_32_BIT_GRAPH if number of nodes in the graph are 
+within 32-bit range (2 x 10^9), else 64-bit range is assumed.
+
+Pass -DOMP_SCHEDULE_RUNTIME if you want to set OMP_SCHEDULE
+for all parallel regions at runtime. If -DOMP_SCHEDULE_RUNTIME 
+is passed, and OMP_SCHEDULE is not set, then the default schedule will
+be chosen (which is most probably "static" or "guided" for most of 
+the OpenMP regions).
+
+Communicating vertex-community information (per iteration) 
+is the most expensive step of our distributed Louvain 
+implementation. We use the one of the following MPI communication 
+primitives for communicating vertex-community during a Louvain
+iteration, that could be enabled by passing predefined
+macros at compile time:
+
+1. MPI Collectives:  -DUSE_MPI_COLLECTIVES
+2. MPI Send-Receive: -DUSE_MPI_SENDRECV
+3. MPI RMA:          -DUSE_MPI_RMA (using -DUSE_MPI_ACCUMULATE 
+                     additionally ensures atomic put) 
+4. Default:          Uses MPI point-to-point nonblocking API.
+
+Apart from these, we use MPI (blocking) collectives, mostly
+MPI_Alltoall.
+
+There are other predefined macros in the code as well for printing
+intermediate results or checking correctness or using a particular
+C++ data structure. 
+
+***********************
+-----------------------
+ EXECUTING THE PROGRAM
+-----------------------
+***********************
+
+E.g.: 
+mpiexec -n 2 bin/./minivite -f karate.bin
+mpiexec -n 2 bin/./minivite -l -n 100
+mpiexec -n 2 bin/./minivite -n 100
+mpiexec -n 2 bin/./minivite -p 2 -n 100
+
+Possible options (can be combined):
+
+1. -f <bin-file>   : Specify input binary file after this argument. 
+2. -n <vertices>   : Pass total number of vertices of the generated graph.
+3. -l              : Use distributed LCG for randomly choosing edges. If this option 
+                     is not used, we will use C++ random number generator (using 
+                     std::default_random_engine).
+4. -p <percent>    : Specify percent of overall edges to be randomly generated between
+                     processes.
+5. -t <threshold>  : Specify threshold quantity (default: 1.0E-06) used to determine the 
+                     exit criteria in an iteration of Louvain method.
+6. -w              : Use Euclidean distance as edge weight. If this option is not used,
+                     edge weights are considered as 1.0. Generate edge weight uniformly 
+                     between (0,1) if Euclidean distance is not available (applicable to 
+                     randomly generated edges).                    
+7. -r <nranks>     : This is used to control the number of aggregators in MPI I/O and is
+                     meaningful when an input binary graph file is passed with option "-f".
+                     naggr := (nranks > 1) ? (nprocs/nranks) : nranks;
diff --git a/miniVite/dspl.hpp b/miniVite/dspl.hpp
new file mode 100644
index 0000000..f86ae90
--- /dev/null
+++ b/miniVite/dspl.hpp
@@ -0,0 +1,1392 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************ 
+
+#pragma once
+#ifndef DSPL_HPP
+#define DSPL_HPP
+
+#include <algorithm>
+#include <fstream>
+#include <functional>
+#include <iostream>
+#include <list>
+#include <numeric>
+#include <vector>
+#include <unordered_map>
+#include <unordered_set>
+#include <map>
+
+#include <mpi.h>
+#include <omp.h>
+
+#include "graph.hpp"
+#include "utils.hpp"
+
+struct Comm {
+  GraphElem size;
+  GraphWeight degree;
+
+  Comm() : size(0), degree(0.0) {};
+};
+
+struct CommInfo {
+    GraphElem community;
+    GraphElem size;
+    GraphWeight degree;
+};
+
+const int SizeTag           = 1;
+const int VertexTag         = 2;
+const int CommunityTag      = 3;
+const int CommunitySizeTag  = 4;
+const int CommunityDataTag  = 5;
+
+static MPI_Datatype commType;
+
+void distSumVertexDegree(const Graph &g, std::vector<GraphWeight> &vDegree, std::vector<Comm> &localCinfo)
+{
+  const GraphElem nv = g.get_lnv();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(guided)
+#endif
+  for (GraphElem i = 0; i < nv; i++) {
+    GraphElem e0, e1;
+    GraphWeight tw = 0.0;
+
+    g.edge_range(i, e0, e1);
+
+    for (GraphElem k = e0; k < e1; k++) {
+      const Edge &edge = g.get_edge(k);
+      tw += edge.weight_;
+    }
+
+    vDegree[i] = tw;
+   
+    localCinfo[i].degree = tw;
+    localCinfo[i].size = 1L;
+  }
+} // distSumVertexDegree
+
+GraphWeight distCalcConstantForSecondTerm(const std::vector<GraphWeight> &vDegree, MPI_Comm gcomm)
+{
+  GraphWeight totalEdgeWeightTwice = 0.0;
+  GraphWeight localWeight = 0.0;
+  int me = -1;
+
+  const size_t vsz = vDegree.size();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(static)
+#endif  
+  for (GraphElem i = 0; i < vsz; i++)
+    localWeight += vDegree[i]; // Local reduction
+
+  // Global reduction
+  MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, 
+          MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+  return (1.0 / static_cast<GraphWeight>(totalEdgeWeightTwice));
+} // distCalcConstantForSecondTerm
+
+void distInitComm(std::vector<GraphElem> &pastComm, std::vector<GraphElem> &currComm, const GraphElem base)
+{
+  const size_t csz = currComm.size();
+
+#ifdef DEBUG_PRINTF  
+  assert(csz == pastComm.size());
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(static)
+#endif
+  for (GraphElem i = 0L; i < csz; i++) {
+    pastComm[i] = i + base;
+    currComm[i] = i + base;
+  }
+} // distInitComm
+
+void distInitLouvain(const Graph &dg, std::vector<GraphElem> &pastComm, 
+        std::vector<GraphElem> &currComm, std::vector<GraphWeight> &vDegree, 
+        std::vector<GraphWeight> &clusterWeight, std::vector<Comm> &localCinfo, 
+        std::vector<Comm> &localCupdate, GraphWeight &constantForSecondTerm,
+        const int me)
+{
+  const GraphElem base = dg.get_base(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  vDegree.resize(nv);
+  pastComm.resize(nv);
+  currComm.resize(nv);
+  clusterWeight.resize(nv);
+  localCinfo.resize(nv);
+  localCupdate.resize(nv);
+ 
+  distSumVertexDegree(dg, vDegree, localCinfo);
+  constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm);
+
+  distInitComm(pastComm, currComm, base);
+} // distInitLouvain
+
+GraphElem distGetMaxIndex(const std::unordered_map<GraphElem, GraphElem> &clmap, const std::vector<GraphWeight> &counter,
+			  const GraphWeight selfLoop, const std::vector<Comm> &localCinfo, 
+			  const std::map<GraphElem,Comm> &remoteCinfo, const GraphWeight vDegree, 
+                          const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm,
+			  const GraphElem base, const GraphElem bound, const GraphWeight constant)
+{
+  std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+  GraphElem maxIndex = currComm;
+  GraphWeight curGain = 0.0, maxGain = 0.0;
+  GraphWeight eix = static_cast<GraphWeight>(counter[0]) - static_cast<GraphWeight>(selfLoop);
+
+  GraphWeight ax = currDegree - vDegree;
+  GraphWeight eiy = 0.0, ay = 0.0;
+
+  GraphElem maxSize = currSize; 
+  GraphElem size = 0;
+
+  storedAlready = clmap.begin();
+#ifdef DEBUG_PRINTF  
+  assert(storedAlready != clmap.end());
+#endif
+  do {
+      if (currComm != storedAlready->first) {
+
+          // is_local, direct access local info
+          if ((storedAlready->first >= base) && (storedAlready->first < bound)) {
+              ay = localCinfo[storedAlready->first-base].degree;
+              size = localCinfo[storedAlready->first - base].size;   
+          }
+          else {
+              // is_remote, lookup map
+              std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(storedAlready->first);
+              ay = citer->second.degree;
+              size = citer->second.size; 
+          }
+
+          eiy = counter[storedAlready->second];
+
+          curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant;
+
+          if ((curGain > maxGain) ||
+                  ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) {
+              maxGain = curGain;
+              maxIndex = storedAlready->first;
+              maxSize = size;
+          }
+      }
+      storedAlready++;
+  } while (storedAlready != clmap.end());
+
+  if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm))
+    maxIndex = currComm;
+
+  return maxIndex;
+} // distGetMaxIndex
+
+GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map<GraphElem, GraphElem> &clmap, 
+				   std::vector<GraphWeight> &counter, const Graph &g, 
+                                   const std::vector<GraphElem> &currComm, 
+                                   const std::unordered_map<GraphElem, GraphElem> &remoteComm,
+	                           const GraphElem vertex, const GraphElem base, const GraphElem bound)
+{
+  GraphElem numUniqueClusters = 1L;
+  GraphWeight selfLoop = 0;
+  std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+
+  for (GraphElem j = e0; j < e1; j++) {
+        
+    const Edge &edge = g.get_edge(j);
+    const GraphElem &tail_ = edge.tail_;
+    const GraphWeight &weight = edge.weight_;
+    GraphElem tcomm;
+
+    if (tail_ == vertex + base)
+      selfLoop += weight;
+
+    // is_local, direct access local std::vector<GraphElem>
+    if ((tail_ >= base) && (tail_ < bound))
+      tcomm = currComm[tail_ - base];
+    else { // is_remote, lookup map
+      std::unordered_map<GraphElem, GraphElem>::const_iterator iter = remoteComm.find(tail_);
+
+#ifdef DEBUG_PRINTF  
+      assert(iter != remoteComm.end());
+#endif
+      tcomm = iter->second;
+    }
+
+    storedAlready = clmap.find(tcomm);
+    
+    if (storedAlready != clmap.end())
+      counter[storedAlready->second] += weight;
+    else {
+        clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(tcomm, numUniqueClusters));
+        counter.push_back(weight);
+        numUniqueClusters++;
+    }
+  }
+
+  return selfLoop;
+} // distBuildLocalMapCounter
+
+void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector<GraphElem> &currComm,
+				 std::vector<GraphElem> &targetComm, const std::vector<GraphWeight> &vDegree,
+                                 std::vector<Comm> &localCinfo, std::vector<Comm> &localCupdate,
+				 const std::unordered_map<GraphElem, GraphElem> &remoteComm, 
+                                 const std::map<GraphElem,Comm> &remoteCinfo, 
+                                 std::map<GraphElem,Comm> &remoteCupdate, const GraphWeight constantForSecondTerm,
+                                 std::vector<GraphWeight> &clusterWeight, const int me)
+{
+  GraphElem localTarget = -1;
+  GraphElem e0, e1, selfLoop = 0;
+  std::unordered_map<GraphElem, GraphElem> clmap;
+  std::vector<GraphWeight> counter;
+
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem cc = currComm[i];
+  GraphWeight ccDegree;
+  GraphElem ccSize;  
+  bool currCommIsLocal = false; 
+  bool targetCommIsLocal = false;
+
+  // Current Community is local
+  if (cc >= base && cc < bound) {
+	ccDegree=localCinfo[cc-base].degree;
+        ccSize=localCinfo[cc-base].size;
+        currCommIsLocal=true;
+  } else {
+  // is remote
+        std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(cc);
+	ccDegree = citer->second.degree;
+ 	ccSize = citer->second.size;
+	currCommIsLocal=false;
+  }
+
+  dg.edge_range(i, e0, e1);
+
+  if (e0 != e1) {
+    clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(cc, 0));
+    counter.push_back(0.0);
+
+    selfLoop =  distBuildLocalMapCounter(e0, e1, clmap, counter, dg, 
+                    currComm, remoteComm, i, base, bound);
+
+    clusterWeight[i] += counter[0];
+
+    localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, 
+                    vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm);
+  }
+  else
+    localTarget = cc;
+
+   // is the Target Local?
+   if (localTarget >= base && localTarget < bound)
+      targetCommIsLocal = true;
+  
+  // current and target comm are local - atomic updates to vectors
+  if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) {
+        
+#ifdef DEBUG_PRINTF  
+        assert( base < localTarget < bound);
+        assert( base < cc < bound);
+	assert( cc - base < localCupdate.size()); 	
+	assert( localTarget - base < localCupdate.size()); 	
+#endif
+        #pragma omp atomic update
+        localCupdate[localTarget-base].degree += vDegree[i];
+        #pragma omp atomic update
+        localCupdate[localTarget-base].size++;
+        #pragma omp atomic update
+        localCupdate[cc-base].degree -= vDegree[i];
+        #pragma omp atomic update
+        localCupdate[cc-base].size--;
+     }	
+
+  // current is local, target is not - do atomic on local, accumulate in Maps for remote
+  if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) {
+        #pragma omp atomic update
+        localCupdate[cc-base].degree -= vDegree[i];
+        #pragma omp atomic update
+        localCupdate[cc-base].size--;
+ 
+        // search target!     
+        std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(localTarget);
+ 
+        #pragma omp atomic update
+        iter->second.degree += vDegree[i];
+        #pragma omp atomic update
+        iter->second.size++;
+  }
+        
+   // current is remote, target is local - accumulate for current, atomic on local
+   if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) {
+        #pragma omp atomic update
+        localCupdate[localTarget-base].degree += vDegree[i];
+        #pragma omp atomic update
+        localCupdate[localTarget-base].size++;
+       
+        // search current 
+        std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(cc);
+  
+        #pragma omp atomic update
+        iter->second.degree -= vDegree[i];
+        #pragma omp atomic update
+        iter->second.size--;
+   }
+                    
+   // current and target are remote - accumulate for both
+   if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) {
+       
+        // search current 
+        std::map<GraphElem,Comm>::iterator iter = remoteCupdate.find(cc);
+  
+        #pragma omp atomic update
+        iter->second.degree -= vDegree[i];
+        #pragma omp atomic update
+        iter->second.size--;
+   
+        // search target
+        iter=remoteCupdate.find(localTarget);
+  
+        #pragma omp atomic update
+        iter->second.degree += vDegree[i];
+        #pragma omp atomic update
+        iter->second.size++;
+   }
+
+#ifdef DEBUG_PRINTF  
+  assert(localTarget != -1);
+#endif
+  targetComm[i] = localTarget;
+} // distExecuteLouvainIteration
+
+GraphWeight distComputeModularity(const Graph &g, std::vector<Comm> &localCinfo,
+			     const std::vector<GraphWeight> &clusterWeight,
+			     const GraphWeight constantForSecondTerm,
+			     const int me)
+{
+  const GraphElem nv = g.get_lnv();
+  MPI_Comm gcomm = g.get_comm();
+
+  GraphWeight le_la_xx[2];
+  GraphWeight e_a_xx[2] = {0.0, 0.0};
+  GraphWeight le_xx = 0.0, la2_x = 0.0;
+
+#ifdef DEBUG_PRINTF  
+  assert((clusterWeight.size() == nv));
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(static)
+#endif
+  for (GraphElem i = 0L; i < nv; i++) {
+    le_xx += clusterWeight[i];
+    la2_x += static_cast<GraphWeight>(localCinfo[i].degree) * static_cast<GraphWeight>(localCinfo[i].degree); 
+  } 
+  le_la_xx[0] = le_xx;
+  le_la_xx[1] = la2_x;
+
+#ifdef DEBUG_PRINTF  
+  const double t0 = MPI_Wtime();
+#endif
+
+  MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+#endif
+
+  GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - 
+      (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm);
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl;
+  std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl;
+  std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl;
+#endif
+
+  return currMod;
+} // distComputeModularity
+
+void distUpdateLocalCinfo(std::vector<Comm> &localCinfo, const std::vector<Comm> &localCupdate)
+{
+    size_t csz = localCinfo.size();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < csz; i++) {
+        localCinfo[i].size += localCupdate[i].size;
+        localCinfo[i].degree += localCupdate[i].degree;
+    }
+}
+
+void distCleanCWandCU(const GraphElem nv, std::vector<GraphWeight> &clusterWeight,
+        std::vector<Comm> &localCupdate)
+{
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < nv; i++) {
+        clusterWeight[i] = 0;
+        localCupdate[i].degree = 0;
+        localCupdate[i].size = 0;
+    }
+} // distCleanCWandCU
+
+#if defined(USE_MPI_RMA)
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate, 
+        const MPI_Win &commwin, const std::vector<MPI_Aint> &disp)
+#else
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate)
+#endif
+{
+#if defined(USE_MPI_RMA)
+    std::vector<GraphElem> scdata(ssz);
+#else
+    std::vector<GraphElem> rcdata(rsz), scdata(ssz);
+#endif
+  GraphElem spos, rpos;
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector< std::vector< GraphElem > > rcinfo(nprocs);
+#else
+  std::vector<std::unordered_set<GraphElem> > rcinfo(nprocs);
+#endif
+
+#if defined(USE_MPI_SENDRECV)
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1, ta = 0.0;
+#endif
+
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  // Collects Communities of local vertices for remote nodes
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime)
+#else
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static)
+#endif
+  for (GraphElem i = 0; i < ssz; i++) {
+    const GraphElem vertex = svdata[i];
+#ifdef DEBUG_PRINTF  
+    assert((vertex >= base) && (vertex < bound));
+#endif
+    const GraphElem comm = currComm[vertex - base];
+    scdata[i] = comm;
+  }
+
+  std::vector<GraphElem> rcsizes(nprocs), scsizes(nprocs);
+  std::vector<CommInfo> sinfo, rinfo;
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      scnts[i] = ssizes[i];
+      rcnts[i] = rsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#elif defined(USE_MPI_RMA)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_ACCUMULATE)
+          MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin);
+#else
+          MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin);
+#endif
+      }
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#elif defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#else
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    rpos += rsizes[i];
+  }
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+
+    spos += ssizes[i];
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  // reserve vectors
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  for (GraphElem i = 0; i < nprocs; i++) {
+      rcinfo[i].reserve(rpos);
+  }
+#endif
+
+  // fetch baseptr from MPI window
+#if defined(USE_MPI_RMA)
+  MPI_Win_flush_all(commwin);
+  MPI_Barrier(gcomm);
+
+  GraphElem *rcbuf = nullptr;
+  int flag = 0;
+  MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag);
+#endif
+
+  remoteComm.clear();
+  for (GraphElem i = 0; i < rpos; i++) {
+
+#if defined(USE_MPI_RMA)
+    const GraphElem comm = rcbuf[i];
+#else
+    const GraphElem comm = rcdata[i];
+#endif
+
+    remoteComm.insert(std::unordered_map<GraphElem, GraphElem>::value_type(rvdata[i], comm));
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+  for (GraphElem i = 0; i < nv; i++) {
+    const GraphElem comm = currComm[i];
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  GraphElem stcsz = 0, rtcsz = 0;
+  
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    scsizes[i] = rcinfo[i].size();
+    stcsz += scsizes[i];
+  }
+
+  MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rtcsz += rcsizes[i];
+  }
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl;
+#endif
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#else
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector<GraphElem> rcomms(rtcsz);
+#else
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#endif
+#endif
+  sinfo.resize(rtcsz);
+  rinfo.resize(stcsz);
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+      }
+      scnts[i] = scsizes[i];
+      rcnts[i] = rcsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rdispls[i] + j];
+              sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+      }
+  }
+  
+  MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), 
+          commType, rinfo.data(), scnts.data(), sdispls.data(), 
+          commType, gcomm);
+#else
+#if !defined(USE_MPI_SENDRECV)
+  std::vector<MPI_Request> rcreqs(nprocs);
+#endif
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#endif
+#else
+          MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &rreqs[i]);
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#endif
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+  spos = 0;
+  rpos = 0;
+          
+  // poke progress on last isend/irecvs
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  int tf = 0, id = 0;
+  MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE);
+#endif
+
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+          
+          MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, 
+                  rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, &rcreqs[i]);
+
+          // poke progress on last isend/irecvs
+#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+          int flag = 0, done = 0;
+          while (!done) {
+              MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE);
+              MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE);
+              if (flag) 
+                  done = 1;
+          }
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+
+          MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, 
+                  CommunityDataTag, gcomm, &sreqs[i]);
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rcreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+#if !defined(USE_MPI_SENDRECV)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  remoteCinfo.clear();
+  remoteCupdate.clear();
+
+  for (GraphElem i = 0; i < stcsz; i++) {
+      const GraphElem ccomm = rinfo[i].community;
+
+      Comm comm;
+
+      comm.size = rinfo[i].size;
+      comm.degree = rinfo[i].degree;
+
+      remoteCinfo.insert(std::map<GraphElem,Comm>::value_type(ccomm, comm));
+      remoteCupdate.insert(std::map<GraphElem,Comm>::value_type(ccomm, Comm()));
+  }
+} // end fillRemoteCommunities
+
+void createCommunityMPIType()
+{
+  CommInfo cinfo;
+
+  MPI_Aint begin, community, size, degree;
+
+  MPI_Get_address(&cinfo, &begin);
+  MPI_Get_address(&cinfo.community, &community);
+  MPI_Get_address(&cinfo.size, &size);
+  MPI_Get_address(&cinfo.degree, &degree);
+
+  int blens[] = { 1, 1, 1 };
+  MPI_Aint displ[] = { community - begin, size - begin, degree - begin };
+  MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE };
+
+  MPI_Type_create_struct(3, blens, displ, types, &commType);
+  MPI_Type_commit(&commType);
+} // createCommunityMPIType
+
+void destroyCommunityMPIType()
+{
+  MPI_Type_free(&commType);
+} // destroyCommunityMPIType
+
+void updateRemoteCommunities(const Graph &dg, std::vector<Comm> &localCinfo,
+			     const std::map<GraphElem,Comm> &remoteCupdate,
+			     const int me, const int nprocs)
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  std::vector<std::vector<CommInfo>> remoteArray(nprocs);
+  MPI_Comm gcomm = dg.get_comm();
+  
+  // FIXME TODO can we use TBB::concurrent_vector instead,
+  // to make this parallel; first we have to get rid of maps
+  for (std::map<GraphElem,Comm>::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) {
+      const GraphElem i = iter->first;
+      const Comm &curr = iter->second;
+
+      const int tproc = dg.get_owner(i);
+
+#ifdef DEBUG_PRINTF  
+      assert(tproc != me);
+#endif
+      CommInfo rcinfo;
+
+      rcinfo.community = i;
+      rcinfo.size = curr.size;
+      rcinfo.degree = curr.degree;
+
+      remoteArray[tproc].push_back(rcinfo);
+  }
+
+  std::vector<GraphElem> send_sz(nprocs), recv_sz(nprocs);
+
+#ifdef DEBUG_PRINTF  
+  GraphWeight tc = 0.0;
+  const double t0 = MPI_Wtime();
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for schedule(runtime)
+#else
+#pragma omp parallel for schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    send_sz[i] = remoteArray[i].size();
+  }
+
+  MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+  tc += (t1 - t0);
+#endif
+
+  GraphElem rcnt = 0, scnt = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(runtime)
+#else
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rcnt += recv_sz[i];
+    scnt += send_sz[i];
+  }
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl;
+#endif
+
+  GraphElem currPos = 0;
+  std::vector<CommInfo> rdata(rcnt);
+
+#ifdef DEBUG_PRINTF  
+  const double t2 = MPI_Wtime();
+#endif
+#if defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, 
+                  rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      currPos += recv_sz[i];
+  }
+#else
+  std::vector<MPI_Request> sreqs(nprocs), rreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    currPos += recv_sz[i];
+  }
+
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  const double t3 = MPI_Wtime();
+  std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl;
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime)
+#else
+#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic)
+#endif
+  for (GraphElem i = 0; i < rcnt; i++) {
+    const CommInfo &curr = rdata[i];
+
+#ifdef DEBUG_PRINTF  
+    assert(dg.get_owner(curr.community) == me);
+#endif
+    localCinfo[curr.community-base].size += curr.size;
+    localCinfo[curr.community-base].degree += curr.degree;
+  }
+} // updateRemoteCommunities
+
+// initial setup before Louvain iteration begins
+#if defined(USE_MPI_RMA)
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs, MPI_Win &commwin)
+#else
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs)
+#endif
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+#ifdef USE_OPENMP_LOCK
+  std::vector<omp_lock_t> locks(nprocs);
+  for (int i = 0; i < nprocs; i++)
+    omp_init_lock(&locks[i]);
+#endif
+  std::vector<std::unordered_set<GraphElem>> parray(nprocs);
+
+#ifdef USE_OPENMP_LOCK
+#pragma omp parallel default(none), shared(dg, locks, parray)
+#else
+#pragma omp parallel default(none), shared(dg, parray)
+#endif
+  {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+      GraphElem e0, e1;
+
+      dg.edge_range(i, e0, e1);
+
+      for (GraphElem j = e0; j < e1; j++) {
+	const Edge &edge = dg.get_edge(j);
+	const int tproc = dg.get_owner(edge.tail_);
+
+	if (tproc != me) {
+#ifdef USE_OPENMP_LOCK
+	  omp_set_lock(&locks[tproc]);
+#else
+          lock();
+#endif
+	  parray[tproc].insert(edge.tail_);
+#ifdef USE_OPENMP_LOCK
+	  omp_unset_lock(&locks[tproc]);
+#else
+          unlock();
+#endif
+	}
+      }
+    }
+  }
+
+#ifdef USE_OPENMP_LOCK
+  for (int i = 0; i < nprocs; i++) {
+    omp_destroy_lock(&locks[i]);
+  }
+#endif
+  
+  rsizes.resize(nprocs);
+  ssizes.resize(nprocs);
+  ssz = 0, rsz = 0;
+
+  int pproc = 0;
+  // TODO FIXME parallelize this loop
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+    ssz += iter->size();
+    ssizes[pproc] = iter->size();
+    pproc++;
+  }
+
+  MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+  GraphElem rsz_r = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(runtime)
+#else
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++)
+    rsz_r += rsizes[i];
+  rsz = rsz_r;
+  
+  svdata.resize(ssz);
+  rvdata.resize(rsz);
+
+  GraphElem cpos = 0, rpos = 0;
+  pproc = 0;
+
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+      
+      scnts[pproc] = iter->size();
+      rcnts[pproc] = rsizes[pproc];
+      sdispls[pproc] = cpos;
+      rdispls[pproc] = rpos;
+      cpos += iter->size();
+      rpos += rcnts[pproc];
+
+      pproc++;
+  }
+
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+                  VertexTag, gcomm, &rreqs[i]);
+      else
+          rreqs[i] = MPI_REQUEST_NULL;
+
+      rpos += rsizes[i];
+  }
+
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+
+      if (me != pproc)
+          MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, 
+                  VertexTag, gcomm, &sreqs[pproc]);
+      else
+          sreqs[pproc] = MPI_REQUEST_NULL;
+
+      cpos += iter->size();
+      pproc++;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  std::swap(svdata, rvdata);
+  std::swap(ssizes, rsizes);
+  std::swap(ssz, rsz);
+
+  // create MPI window for communities
+#if defined(USE_MPI_RMA)  
+  GraphElem *ptr = nullptr;
+  MPI_Info info = MPI_INFO_NULL;
+#if defined(USE_MPI_ACCUMULATE)
+  MPI_Info_create(&info);
+  MPI_Info_set(info, "accumulate_ordering", "none");
+  MPI_Info_set(info, "accumulate_ops", "same_op");
+#endif
+  MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), 
+          info, gcomm, &ptr, &commwin);
+  MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin);
+#endif
+} // exchangeVertexReqs
+
+#if defined(USE_MPI_RMA)
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters, MPI_Win &commwin)
+#else
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters)
+#endif
+{
+  std::vector<GraphElem> pastComm, currComm, targetComm;
+  std::vector<GraphWeight> vDegree;
+  std::vector<GraphWeight> clusterWeight;
+  std::vector<Comm> localCinfo, localCupdate;
+ 
+  std::unordered_map<GraphElem, GraphElem> remoteComm;
+  std::map<GraphElem,Comm> remoteCinfo, remoteCupdate;
+  
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  GraphWeight constantForSecondTerm;
+  GraphWeight prevMod = lower;
+  GraphWeight currMod = -1.0;
+  int numIters = 0;
+  
+  distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, 
+          localCupdate, constantForSecondTerm, me);
+  targetComm.resize(nv);
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl;
+  if (me == 0)
+      std::cout << "Threshold: " << thresh << std::endl;
+#endif
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1;
+  t0 = MPI_Wtime();
+#endif
+
+  // setup vertices and communities
+#if defined(USE_MPI_RMA)
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs, commwin);
+  
+  // store the remote displacements 
+  std::vector<MPI_Aint> disp(nprocs);
+  MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, 
+          MPI_SUM, gcomm);
+#else
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl;
+#endif
+ 
+  // start Louvain iteration
+  while(true) {
+#ifdef DEBUG_PRINTF  
+    const double t2 = MPI_Wtime();
+    if (me == 0)
+        std::cout << "Starting Louvain iteration: " << numIters << std::endl;
+#endif
+    numIters++;
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#if defined(USE_MPI_RMA)
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate, 
+            commwin, disp);
+#else
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate);
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t1 = MPI_Wtime();
+    std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl;
+    std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl;
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \
+        vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \
+    firstprivate(constantForSecondTerm)
+    {
+        distCleanCWandCU(nv, clusterWeight, localCupdate);
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided) 
+#endif
+        for (GraphElem i = 0; i < nv; i++) {
+            distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, 
+                    localCupdate, remoteComm, remoteCinfo, remoteCupdate,
+                    constantForSecondTerm, clusterWeight, me);
+        }
+    }
+
+#pragma omp parallel default(none), shared(localCinfo, localCupdate)
+    {
+        distUpdateLocalCinfo(localCinfo, localCupdate);
+    }
+
+    // communicate remote communities
+    updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs);
+
+    // compute modularity
+    currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me);
+
+    // exit criteria
+    if (currMod - prevMod < thresh)
+        break;
+
+    prevMod = currMod;
+    if (prevMod < lower)
+        prevMod = lower;
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none) \
+    shared(pastComm, currComm, targetComm) \
+    schedule(runtime)
+#else
+#pragma omp parallel for default(none) \
+    shared(pastComm, currComm, targetComm) \
+    schedule(static)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+        GraphElem tmp = pastComm[i];
+        pastComm[i] = currComm[i];
+        currComm[i] = targetComm[i];
+        targetComm[i] = tmp;
+    }
+  } // end of Louvain iteration
+
+#if defined(USE_MPI_RMA)
+  MPI_Win_unlock_all(commwin);
+  MPI_Win_free(&commwin);
+#endif  
+
+  iters = numIters;
+
+  vDegree.clear();
+  pastComm.clear();
+  currComm.clear();
+  targetComm.clear();
+  clusterWeight.clear();
+  localCinfo.clear();
+  localCupdate.clear();
+  
+  return prevMod;
+} // distLouvainMethod plain
+
+#endif // __DSPL
diff --git a/miniVite/dspl_gpu.hpp b/miniVite/dspl_gpu.hpp
new file mode 100644
index 0000000..601a382
--- /dev/null
+++ b/miniVite/dspl_gpu.hpp
@@ -0,0 +1,1409 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************ 
+
+#pragma once
+#ifndef DSPL_HPP
+#define DSPL_HPP
+
+#include <algorithm>
+#include <fstream>
+#include <functional>
+#include <iostream>
+#include <list>
+#include <numeric>
+#include <vector>
+#include <unordered_map>
+#include <unordered_set>
+#include <map>
+
+#include <mpi.h>
+#include <omp.h>
+
+#include "graph.hpp"
+#include "utils.hpp"
+
+struct Comm {
+  GraphElem size;
+  GraphWeight degree;
+
+  Comm() : size(0), degree(0.0) {};
+};
+
+struct CommInfo {
+    GraphElem community;
+    GraphElem size;
+    GraphWeight degree;
+};
+
+const int SizeTag           = 1;
+const int VertexTag         = 2;
+const int CommunityTag      = 3;
+const int CommunitySizeTag  = 4;
+const int CommunityDataTag  = 5;
+
+static MPI_Datatype commType;
+
+void distSumVertexDegree(const Graph &g, std::vector<GraphWeight> &vDegree, std::vector<Comm> &localCinfo)
+{
+  const GraphElem nv = g.get_lnv();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(guided)
+#endif
+  for (GraphElem i = 0; i < nv; i++) {
+    GraphElem e0, e1;
+    GraphWeight tw = 0.0;
+
+    g.edge_range(i, e0, e1);
+
+    for (GraphElem k = e0; k < e1; k++) {
+      const Edge &edge = g.get_edge(k);
+      tw += edge.weight_;
+    }
+
+    vDegree[i] = tw;
+   
+    localCinfo[i].degree = tw;
+    localCinfo[i].size = 1L;
+  }
+} // distSumVertexDegree
+
+GraphWeight distCalcConstantForSecondTerm(const std::vector<GraphWeight> &vDegree, MPI_Comm gcomm)
+{
+  GraphWeight totalEdgeWeightTwice = 0.0;
+  GraphWeight localWeight = 0.0;
+  int me = -1;
+
+  const size_t vsz = vDegree.size();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(static)
+#endif  
+  for (GraphElem i = 0; i < vsz; i++)
+    localWeight += vDegree[i]; // Local reduction
+
+  // Global reduction
+  MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, 
+          MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+  return (1.0 / static_cast<GraphWeight>(totalEdgeWeightTwice));
+} // distCalcConstantForSecondTerm
+
+void distInitComm(std::vector<GraphElem> &pastComm, std::vector<GraphElem> &currComm, const GraphElem base)
+{
+  const size_t csz = currComm.size();
+
+#ifdef DEBUG_PRINTF  
+  assert(csz == pastComm.size());
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(static)
+#endif
+  for (GraphElem i = 0L; i < csz; i++) {
+    pastComm[i] = i + base;
+    currComm[i] = i + base;
+  }
+} // distInitComm
+
+void distInitLouvain(const Graph &dg, std::vector<GraphElem> &pastComm, 
+        std::vector<GraphElem> &currComm, std::vector<GraphWeight> &vDegree, 
+        std::vector<GraphWeight> &clusterWeight, std::vector<Comm> &localCinfo, 
+        std::vector<Comm> &localCupdate, GraphWeight &constantForSecondTerm,
+        const int me)
+{
+  const GraphElem base = dg.get_base(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  vDegree.resize(nv);
+  pastComm.resize(nv);
+  currComm.resize(nv);
+  clusterWeight.resize(nv);
+  localCinfo.resize(nv);
+  localCupdate.resize(nv);
+ 
+  distSumVertexDegree(dg, vDegree, localCinfo);
+  constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm);
+
+  distInitComm(pastComm, currComm, base);
+} // distInitLouvain
+
+GraphElem distGetMaxIndex(const std::unordered_map<GraphElem, GraphElem> &clmap, const std::vector<GraphWeight> &counter,
+			  const GraphWeight selfLoop, const std::vector<Comm> &localCinfo, 
+			  const std::map<GraphElem,Comm> &remoteCinfo, const GraphWeight vDegree, 
+                          const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm,
+			  const GraphElem base, const GraphElem bound, const GraphWeight constant)
+{
+  std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+  GraphElem maxIndex = currComm;
+  GraphWeight curGain = 0.0, maxGain = 0.0;
+  GraphWeight eix = static_cast<GraphWeight>(counter[0]) - static_cast<GraphWeight>(selfLoop);
+
+  GraphWeight ax = currDegree - vDegree;
+  GraphWeight eiy = 0.0, ay = 0.0;
+
+  GraphElem maxSize = currSize; 
+  GraphElem size = 0;
+
+  storedAlready = clmap.begin();
+#ifdef DEBUG_PRINTF  
+  assert(storedAlready != clmap.end());
+#endif
+  do {
+      if (currComm != storedAlready->first) {
+
+          // is_local, direct access local info
+          if ((storedAlready->first >= base) && (storedAlready->first < bound)) {
+              ay = localCinfo[storedAlready->first-base].degree;
+              size = localCinfo[storedAlready->first - base].size;   
+          }
+          else {
+              // is_remote, lookup map
+              std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(storedAlready->first);
+              ay = citer->second.degree;
+              size = citer->second.size; 
+          }
+
+          eiy = counter[storedAlready->second];
+
+          curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant;
+
+          if ((curGain > maxGain) ||
+                  ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) {
+              maxGain = curGain;
+              maxIndex = storedAlready->first;
+              maxSize = size;
+          }
+      }
+      storedAlready++;
+  } while (storedAlready != clmap.end());
+
+  if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm))
+    maxIndex = currComm;
+
+  return maxIndex;
+} // distGetMaxIndex
+
+GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, std::unordered_map<GraphElem, GraphElem> &clmap, 
+				   std::vector<GraphWeight> &counter, const Graph &g, 
+                                   const std::vector<GraphElem> &currComm, 
+                                   const std::unordered_map<GraphElem, GraphElem> &remoteComm,
+	                           const GraphElem vertex, const GraphElem base, const GraphElem bound)
+{
+  GraphElem numUniqueClusters = 1L;
+  GraphWeight selfLoop = 0;
+  std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+
+  for (GraphElem j = e0; j < e1; j++) {
+        
+    const Edge &edge = g.get_edge(j);
+    const GraphElem &tail_ = edge.tail_;
+    const GraphWeight &weight = edge.weight_;
+    GraphElem tcomm;
+
+    if (tail_ == vertex + base)
+      selfLoop += weight;
+
+    // is_local, direct access local std::vector<GraphElem>
+    if ((tail_ >= base) && (tail_ < bound))
+      tcomm = currComm[tail_ - base];
+    else { // is_remote, lookup map
+      std::unordered_map<GraphElem, GraphElem>::const_iterator iter = remoteComm.find(tail_);
+
+#ifdef DEBUG_PRINTF  
+      assert(iter != remoteComm.end());
+#endif
+      tcomm = iter->second;
+    }
+
+    storedAlready = clmap.find(tcomm);
+    
+    if (storedAlready != clmap.end())
+      counter[storedAlready->second] += weight;
+    else {
+        clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(tcomm, numUniqueClusters));
+        counter.push_back(weight);
+        numUniqueClusters++;
+    }
+  }
+
+  return selfLoop;
+} // distBuildLocalMapCounter
+
+void distExecuteLouvainIteration(const GraphElem i, const Graph &dg, const std::vector<GraphElem> &currComm,
+				 std::vector<GraphElem> &targetComm, const std::vector<GraphWeight> &vDegree,
+                                 std::vector<Comm> &localCinfo, std::vector<Comm> &localCupdate,
+				 const std::unordered_map<GraphElem, GraphElem> &remoteComm, 
+                                 const std::map<GraphElem,Comm> &remoteCinfo, 
+                                 std::map<GraphElem,Comm> &remoteCupdate, const GraphWeight constantForSecondTerm,
+                                 std::vector<GraphWeight> &clusterWeight, const int me)
+{
+  GraphElem localTarget = -1;
+  GraphElem e0, e1, selfLoop = 0;
+  std::unordered_map<GraphElem, GraphElem> clmap;
+  std::vector<GraphWeight> counter;
+
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem cc = currComm[i];
+  GraphWeight ccDegree;
+  GraphElem ccSize;  
+  bool currCommIsLocal = false; 
+  bool targetCommIsLocal = false;
+
+  // Current Community is local
+  if (cc >= base && cc < bound) {
+	ccDegree=localCinfo[cc-base].degree;
+        ccSize=localCinfo[cc-base].size;
+        currCommIsLocal=true;
+  } else {
+  // is remote
+        std::map<GraphElem,Comm>::const_iterator citer = remoteCinfo.find(cc);
+	ccDegree = citer->second.degree;
+ 	ccSize = citer->second.size;
+	currCommIsLocal=false;
+  }
+
+  dg.edge_range(i, e0, e1);
+
+  if (e0 != e1) {
+    clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(cc, 0));
+    counter.push_back(0.0);
+
+    selfLoop =  distBuildLocalMapCounter(e0, e1, clmap, counter, dg, 
+                    currComm, remoteComm, i, base, bound);
+
+    clusterWeight[i] += counter[0];
+
+    localTarget = distGetMaxIndex(clmap, counter, selfLoop, localCinfo, remoteCinfo, 
+                    vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm);
+  }
+  else
+    localTarget = cc;
+
+   // is the Target Local?
+   if (localTarget >= base && localTarget < bound)
+      targetCommIsLocal = true;
+  
+  // current and target comm are local - atomic updates to vectors
+  if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) {
+        
+#ifdef DEBUG_PRINTF  
+        assert( base < localTarget < bound);
+        assert( base < cc < bound);
+	assert( cc - base < localCupdate.size()); 	
+	assert( localTarget - base < localCupdate.size()); 	
+#endif
+        #pragma omp atomic update
+        localCupdate[localTarget-base].degree += vDegree[i];
+        #pragma omp atomic update
+        localCupdate[localTarget-base].size++;
+        #pragma omp atomic update
+        localCupdate[cc-base].degree -= vDegree[i];
+        #pragma omp atomic update
+        localCupdate[cc-base].size--;
+     }	
+
+  // current is local, target is not - do atomic on local, accumulate in Maps for remote
+  if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && !targetCommIsLocal) {
+        #pragma omp atomic update
+        localCupdate[cc-base].degree -= vDegree[i];
+        #pragma omp atomic update
+        localCupdate[cc-base].size--;
+ 
+        // search target!     
+        std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(localTarget);
+ 
+        #pragma omp atomic update
+        iter->second.degree += vDegree[i];
+        #pragma omp atomic update
+        iter->second.size++;
+  }
+        
+   // current is remote, target is local - accumulate for current, atomic on local
+   if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && targetCommIsLocal) {
+        #pragma omp atomic update
+        localCupdate[localTarget-base].degree += vDegree[i];
+        #pragma omp atomic update
+        localCupdate[localTarget-base].size++;
+       
+        // search current 
+        std::map<GraphElem,Comm>::iterator iter=remoteCupdate.find(cc);
+  
+        #pragma omp atomic update
+        iter->second.degree -= vDegree[i];
+        #pragma omp atomic update
+        iter->second.size--;
+   }
+                    
+   // current and target are remote - accumulate for both
+   if ((localTarget != cc) && (localTarget != -1) && !currCommIsLocal && !targetCommIsLocal) {
+       
+        // search current 
+        std::map<GraphElem,Comm>::iterator iter = remoteCupdate.find(cc);
+  
+        #pragma omp atomic update
+        iter->second.degree -= vDegree[i];
+        #pragma omp atomic update
+        iter->second.size--;
+   
+        // search target
+        iter=remoteCupdate.find(localTarget);
+  
+        #pragma omp atomic update
+        iter->second.degree += vDegree[i];
+        #pragma omp atomic update
+        iter->second.size++;
+   }
+
+#ifdef DEBUG_PRINTF  
+  assert(localTarget != -1);
+#endif
+  targetComm[i] = localTarget;
+} // distExecuteLouvainIteration
+
+GraphWeight distComputeModularity(const Graph &g, std::vector<Comm> &localCinfo,
+			     const std::vector<GraphWeight> &clusterWeight,
+			     const GraphWeight constantForSecondTerm,
+			     const int me)
+{
+  const GraphElem nv = g.get_lnv();
+  MPI_Comm gcomm = g.get_comm();
+
+  GraphWeight le_la_xx[2];
+  GraphWeight e_a_xx[2] = {0.0, 0.0};
+  GraphWeight le_xx = 0.0, la2_x = 0.0;
+
+#ifdef DEBUG_PRINTF  
+  assert((clusterWeight.size() == nv));
+#endif
+
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for map(to: clusterWeight, localCinfo) reduction(+: le_xx), reduction(+: la2_x)
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(static)
+#endif
+  for (GraphElem i = 0L; i < nv; i++) {
+    le_xx += clusterWeight[i];
+    la2_x += static_cast<GraphWeight>(localCinfo[i].degree) * static_cast<GraphWeight>(localCinfo[i].degree); 
+  } 
+  le_la_xx[0] = le_xx;
+  le_la_xx[1] = la2_x;
+
+#ifdef DEBUG_PRINTF  
+  const double t0 = MPI_Wtime();
+#endif
+
+  MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+#endif
+
+  GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - 
+      (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm);
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl;
+  std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl;
+  std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl;
+#endif
+
+  return currMod;
+} // distComputeModularity
+
+void distUpdateLocalCinfo(std::vector<Comm> &localCinfo, const std::vector<Comm> &localCupdate)
+{
+    size_t csz = localCinfo.size();
+
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < csz; i++) {
+        localCinfo[i].size += localCupdate[i].size;
+        localCinfo[i].degree += localCupdate[i].degree;
+    }
+}
+
+void distCleanCWandCU(const GraphElem nv, std::vector<GraphWeight> &clusterWeight,
+        std::vector<Comm> &localCupdate)
+{
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < nv; i++) {
+        clusterWeight[i] = 0;
+        localCupdate[i].degree = 0;
+        localCupdate[i].size = 0;
+    }
+} // distCleanCWandCU
+
+#if defined(USE_MPI_RMA)
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate, 
+        const MPI_Win &commwin, const std::vector<MPI_Aint> &disp)
+#else
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate)
+#endif
+{
+#if defined(USE_MPI_RMA)
+    std::vector<GraphElem> scdata(ssz);
+#else
+    std::vector<GraphElem> rcdata(rsz), scdata(ssz);
+#endif
+  GraphElem spos, rpos;
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector< std::vector< GraphElem > > rcinfo(nprocs);
+#else
+  std::vector<std::unordered_set<GraphElem> > rcinfo(nprocs);
+#endif
+
+#if defined(USE_MPI_SENDRECV)
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1, ta = 0.0;
+#endif
+
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  // Collects Communities of local vertices for remote nodes
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime)
+#else
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static)
+#endif
+  for (GraphElem i = 0; i < ssz; i++) {
+    const GraphElem vertex = svdata[i];
+#ifdef DEBUG_PRINTF  
+    assert((vertex >= base) && (vertex < bound));
+#endif
+    const GraphElem comm = currComm[vertex - base];
+    scdata[i] = comm;
+  }
+
+  std::vector<GraphElem> rcsizes(nprocs), scsizes(nprocs);
+  std::vector<CommInfo> sinfo, rinfo;
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      scnts[i] = ssizes[i];
+      rcnts[i] = rsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#elif defined(USE_MPI_RMA)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_ACCUMULATE)
+          MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin);
+#else
+          MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin);
+#endif
+      }
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#elif defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#else
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    rpos += rsizes[i];
+  }
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+
+    spos += ssizes[i];
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  // reserve vectors
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  for (GraphElem i = 0; i < nprocs; i++) {
+      rcinfo[i].reserve(rpos);
+  }
+#endif
+
+  // fetch baseptr from MPI window
+#if defined(USE_MPI_RMA)
+  MPI_Win_flush_all(commwin);
+  MPI_Barrier(gcomm);
+
+  GraphElem *rcbuf = nullptr;
+  int flag = 0;
+  MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag);
+#endif
+
+  remoteComm.clear();
+  for (GraphElem i = 0; i < rpos; i++) {
+
+#if defined(USE_MPI_RMA)
+    const GraphElem comm = rcbuf[i];
+#else
+    const GraphElem comm = rcdata[i];
+#endif
+
+    remoteComm.insert(std::unordered_map<GraphElem, GraphElem>::value_type(rvdata[i], comm));
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+  for (GraphElem i = 0; i < nv; i++) {
+    const GraphElem comm = currComm[i];
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  GraphElem stcsz = 0, rtcsz = 0;
+  
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    scsizes[i] = rcinfo[i].size();
+    stcsz += scsizes[i];
+  }
+
+  MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rtcsz += rcsizes[i];
+  }
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl;
+#endif
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#else
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector<GraphElem> rcomms(rtcsz);
+#else
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#endif
+#endif
+  sinfo.resize(rtcsz);
+  rinfo.resize(stcsz);
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+      }
+      scnts[i] = scsizes[i];
+      rcnts[i] = rcsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rdispls[i] + j];
+              sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+      }
+  }
+  
+  MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), 
+          commType, rinfo.data(), scnts.data(), sdispls.data(), 
+          commType, gcomm);
+#else
+#if !defined(USE_MPI_SENDRECV)
+  std::vector<MPI_Request> rcreqs(nprocs);
+#endif
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#endif
+#else
+          MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &rreqs[i]);
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#endif
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+  spos = 0;
+  rpos = 0;
+          
+  // poke progress on last isend/irecvs
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  int tf = 0, id = 0;
+  MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE);
+#endif
+
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+          
+          MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, 
+                  rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, &rcreqs[i]);
+
+          // poke progress on last isend/irecvs
+#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+          int flag = 0, done = 0;
+          while (!done) {
+              MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE);
+              MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE);
+              if (flag) 
+                  done = 1;
+          }
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+
+          MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, 
+                  CommunityDataTag, gcomm, &sreqs[i]);
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rcreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+#if !defined(USE_MPI_SENDRECV)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  remoteCinfo.clear();
+  remoteCupdate.clear();
+
+  for (GraphElem i = 0; i < stcsz; i++) {
+      const GraphElem ccomm = rinfo[i].community;
+
+      Comm comm;
+
+      comm.size = rinfo[i].size;
+      comm.degree = rinfo[i].degree;
+
+      remoteCinfo.insert(std::map<GraphElem,Comm>::value_type(ccomm, comm));
+      remoteCupdate.insert(std::map<GraphElem,Comm>::value_type(ccomm, Comm()));
+  }
+} // end fillRemoteCommunities
+
+void createCommunityMPIType()
+{
+  CommInfo cinfo;
+
+  MPI_Aint begin, community, size, degree;
+
+  MPI_Get_address(&cinfo, &begin);
+  MPI_Get_address(&cinfo.community, &community);
+  MPI_Get_address(&cinfo.size, &size);
+  MPI_Get_address(&cinfo.degree, &degree);
+
+  int blens[] = { 1, 1, 1 };
+  MPI_Aint displ[] = { community - begin, size - begin, degree - begin };
+  MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE };
+
+  MPI_Type_create_struct(3, blens, displ, types, &commType);
+  MPI_Type_commit(&commType);
+} // createCommunityMPIType
+
+void destroyCommunityMPIType()
+{
+  MPI_Type_free(&commType);
+} // destroyCommunityMPIType
+
+void updateRemoteCommunities(const Graph &dg, std::vector<Comm> &localCinfo,
+			     const std::map<GraphElem,Comm> &remoteCupdate,
+			     const int me, const int nprocs)
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  std::vector<std::vector<CommInfo>> remoteArray(nprocs);
+  MPI_Comm gcomm = dg.get_comm();
+  
+  // FIXME TODO can we use TBB::concurrent_vector instead,
+  // to make this parallel; first we have to get rid of maps
+  for (std::map<GraphElem,Comm>::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) {
+      const GraphElem i = iter->first;
+      const Comm &curr = iter->second;
+
+      const int tproc = dg.get_owner(i);
+
+#ifdef DEBUG_PRINTF  
+      assert(tproc != me);
+#endif
+      CommInfo rcinfo;
+
+      rcinfo.community = i;
+      rcinfo.size = curr.size;
+      rcinfo.degree = curr.degree;
+
+      remoteArray[tproc].push_back(rcinfo);
+  }
+
+  std::vector<GraphElem> send_sz(nprocs), recv_sz(nprocs);
+
+#ifdef DEBUG_PRINTF  
+  GraphWeight tc = 0.0;
+  const double t0 = MPI_Wtime();
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for schedule(runtime)
+#else
+#pragma omp parallel for schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    send_sz[i] = remoteArray[i].size();
+  }
+
+  MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+  tc += (t1 - t0);
+#endif
+
+  GraphElem rcnt = 0, scnt = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(runtime)
+#else
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rcnt += recv_sz[i];
+    scnt += send_sz[i];
+  }
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl;
+#endif
+
+  GraphElem currPos = 0;
+  std::vector<CommInfo> rdata(rcnt);
+
+#ifdef DEBUG_PRINTF  
+  const double t2 = MPI_Wtime();
+#endif
+#if defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, 
+                  rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      currPos += recv_sz[i];
+  }
+#else
+  std::vector<MPI_Request> sreqs(nprocs), rreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    currPos += recv_sz[i];
+  }
+
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  const double t3 = MPI_Wtime();
+  std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl;
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime)
+#else
+#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic)
+#endif
+  for (GraphElem i = 0; i < rcnt; i++) {
+    const CommInfo &curr = rdata[i];
+
+#ifdef DEBUG_PRINTF  
+    assert(dg.get_owner(curr.community) == me);
+#endif
+    localCinfo[curr.community-base].size += curr.size;
+    localCinfo[curr.community-base].degree += curr.degree;
+  }
+} // updateRemoteCommunities
+
+// initial setup before Louvain iteration begins
+#if defined(USE_MPI_RMA)
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs, MPI_Win &commwin)
+#else
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs)
+#endif
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+#ifdef USE_OPENMP_LOCK
+  std::vector<omp_lock_t> locks(nprocs);
+  for (int i = 0; i < nprocs; i++)
+    omp_init_lock(&locks[i]);
+#endif
+  std::vector<std::unordered_set<GraphElem>> parray(nprocs);
+
+#ifdef USE_OPENMP_LOCK
+#pragma omp parallel default(none), shared(dg, locks, parray)
+#else
+#pragma omp parallel default(none), shared(dg, parray)
+#endif
+  {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+      GraphElem e0, e1;
+
+      dg.edge_range(i, e0, e1);
+
+      for (GraphElem j = e0; j < e1; j++) {
+	const Edge &edge = dg.get_edge(j);
+	const int tproc = dg.get_owner(edge.tail_);
+
+	if (tproc != me) {
+#ifdef USE_OPENMP_LOCK
+	  omp_set_lock(&locks[tproc]);
+#else
+          lock();
+#endif
+	  parray[tproc].insert(edge.tail_);
+#ifdef USE_OPENMP_LOCK
+	  omp_unset_lock(&locks[tproc]);
+#else
+          unlock();
+#endif
+	}
+      }
+    }
+  }
+
+#ifdef USE_OPENMP_LOCK
+  for (int i = 0; i < nprocs; i++) {
+    omp_destroy_lock(&locks[i]);
+  }
+#endif
+  
+  rsizes.resize(nprocs);
+  ssizes.resize(nprocs);
+  ssz = 0, rsz = 0;
+
+  int pproc = 0;
+  // TODO FIXME parallelize this loop
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+    ssz += iter->size();
+    ssizes[pproc] = iter->size();
+    pproc++;
+  }
+
+  MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+  GraphElem rsz_r = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(runtime)
+#else
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++)
+    rsz_r += rsizes[i];
+  rsz = rsz_r;
+  
+  svdata.resize(ssz);
+  rvdata.resize(rsz);
+
+  GraphElem cpos = 0, rpos = 0;
+  pproc = 0;
+
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+      
+      scnts[pproc] = iter->size();
+      rcnts[pproc] = rsizes[pproc];
+      sdispls[pproc] = cpos;
+      rdispls[pproc] = rpos;
+      cpos += iter->size();
+      rpos += rcnts[pproc];
+
+      pproc++;
+  }
+
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+                  VertexTag, gcomm, &rreqs[i]);
+      else
+          rreqs[i] = MPI_REQUEST_NULL;
+
+      rpos += rsizes[i];
+  }
+
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+
+      if (me != pproc)
+          MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, 
+                  VertexTag, gcomm, &sreqs[pproc]);
+      else
+          sreqs[pproc] = MPI_REQUEST_NULL;
+
+      cpos += iter->size();
+      pproc++;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  std::swap(svdata, rvdata);
+  std::swap(ssizes, rsizes);
+  std::swap(ssz, rsz);
+
+  // create MPI window for communities
+#if defined(USE_MPI_RMA)  
+  GraphElem *ptr = nullptr;
+  MPI_Info info = MPI_INFO_NULL;
+#if defined(USE_MPI_ACCUMULATE)
+  MPI_Info_create(&info);
+  MPI_Info_set(info, "accumulate_ordering", "none");
+  MPI_Info_set(info, "accumulate_ops", "same_op");
+#endif
+  MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), 
+          info, gcomm, &ptr, &commwin);
+  MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin);
+#endif
+} // exchangeVertexReqs
+
+#if defined(USE_MPI_RMA)
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters, MPI_Win &commwin)
+#else
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters)
+#endif
+{
+  std::vector<GraphElem> pastComm, currComm, targetComm;
+  std::vector<GraphWeight> vDegree;
+  std::vector<GraphWeight> clusterWeight;
+  std::vector<Comm> localCinfo, localCupdate;
+ 
+  std::unordered_map<GraphElem, GraphElem> remoteComm;
+  std::map<GraphElem,Comm> remoteCinfo, remoteCupdate;
+  
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  GraphWeight constantForSecondTerm;
+  GraphWeight prevMod = lower;
+  GraphWeight currMod = -1.0;
+  int numIters = 0;
+  
+  distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, 
+          localCupdate, constantForSecondTerm, me);
+  targetComm.resize(nv);
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl;
+  if (me == 0)
+      std::cout << "Threshold: " << thresh << std::endl;
+#endif
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1;
+  t0 = MPI_Wtime();
+#endif
+
+  // setup vertices and communities
+#if defined(USE_MPI_RMA)
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs, commwin);
+  
+  // store the remote displacements 
+  std::vector<MPI_Aint> disp(nprocs);
+  MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, 
+          MPI_SUM, gcomm);
+#else
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl;
+#endif
+ 
+  // start Louvain iteration
+  while(true) {
+#ifdef DEBUG_PRINTF  
+    const double t2 = MPI_Wtime();
+    if (me == 0)
+        std::cout << "Starting Louvain iteration: " << numIters << std::endl;
+#endif
+    numIters++;
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#if defined(USE_MPI_RMA)
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate, 
+            commwin, disp);
+#else
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate);
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t1 = MPI_Wtime();
+    std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl;
+    std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl;
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#if defined(OMP_GPU)
+#pragma omp target data map(from: clusterWeight, localCupdate, targetComm) map(to: dg, currComm, vDegree) \
+        map(to: localCinfo, remoteCinfo, remoteComm, remoteCupdate)
+#else
+#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \
+        vDegree, localCinfo, remoteCinfo, remoteComm, pastComm, dg, remoteCupdate), \
+    firstprivate(constantForSecondTerm)
+#endif
+    {
+        distCleanCWandCU(nv, clusterWeight, localCupdate);
+
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided) 
+#endif
+        for (GraphElem i = 0; i < nv; i++) {
+            distExecuteLouvainIteration(i, dg, currComm, targetComm, vDegree, localCinfo, 
+                    localCupdate, remoteComm, remoteCinfo, remoteCupdate,
+                    constantForSecondTerm, clusterWeight, me);
+        }
+    }
+
+#if defined(OMP_GPU)
+#pragma omp target data map(to: localCinfo, localCupdate)
+#else
+#pragma omp parallel default(none), shared(localCinfo, localCupdate)
+#endif
+    {
+        distUpdateLocalCinfo(localCinfo, localCupdate);
+    }
+
+    // communicate remote communities
+    updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs);
+
+    // compute modularity
+    currMod = distComputeModularity(dg, localCinfo, clusterWeight, constantForSecondTerm, me);
+
+    // exit criteria
+    if (currMod - prevMod < thresh)
+        break;
+
+    prevMod = currMod;
+    if (prevMod < lower)
+        prevMod = lower;
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none) \
+    shared(pastComm, currComm, targetComm) \
+    schedule(runtime)
+#else
+#pragma omp parallel for default(none) \
+    shared(pastComm, currComm, targetComm) \
+    schedule(static)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+        GraphElem tmp = pastComm[i];
+        pastComm[i] = currComm[i];
+        currComm[i] = targetComm[i];
+        targetComm[i] = tmp;
+    }
+  } // end of Louvain iteration
+
+#if defined(USE_MPI_RMA)
+  MPI_Win_unlock_all(commwin);
+  MPI_Win_free(&commwin);
+#endif  
+
+  iters = numIters;
+
+  vDegree.clear();
+  pastComm.clear();
+  currComm.clear();
+  targetComm.clear();
+  clusterWeight.clear();
+  localCinfo.clear();
+  localCupdate.clear();
+  
+  return prevMod;
+} // distLouvainMethod plain
+
+#endif // __DSPL
diff --git a/miniVite/dspl_gpu_kernel.hpp b/miniVite/dspl_gpu_kernel.hpp
new file mode 100644
index 0000000..1cf9c70
--- /dev/null
+++ b/miniVite/dspl_gpu_kernel.hpp
@@ -0,0 +1,1447 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************ 
+
+#pragma once
+#ifndef DSPL_HPP
+#define DSPL_HPP
+
+#include <algorithm>
+#include <fstream>
+#include <functional>
+#include <iostream>
+#include <list>
+#include <numeric>
+#include <vector>
+#include <unordered_map>
+#include <unordered_set>
+#include <map>
+
+#include <mpi.h>
+#include <omp.h>
+
+#include "graph.hpp"
+#include "utils.hpp"
+
+struct Comm {
+  GraphElem size;
+  GraphWeight degree;
+
+  Comm() : size(0), degree(0.0) {};
+};
+
+struct CommInfo {
+    GraphElem community;
+    GraphElem size;
+    GraphWeight degree;
+};
+
+const int SizeTag           = 1;
+const int VertexTag         = 2;
+const int CommunityTag      = 3;
+const int CommunitySizeTag  = 4;
+const int CommunityDataTag  = 5;
+
+static MPI_Datatype commType;
+
+void distSumVertexDegree(const Graph &g, std::vector<GraphWeight> &vDegree, std::vector<Comm> &localCinfo)
+{
+  const GraphElem nv = g.get_lnv();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(g, vDegree, localCinfo), firstprivate(nv) schedule(guided)
+#endif
+  for (GraphElem i = 0; i < nv; i++) {
+    GraphElem e0, e1;
+    GraphWeight tw = 0.0;
+
+    g.edge_range(i, e0, e1);
+
+    for (GraphElem k = e0; k < e1; k++) {
+      const Edge &edge = g.get_edge(k);
+      tw += edge.weight_;
+    }
+
+    vDegree[i] = tw;
+   
+    localCinfo[i].degree = tw;
+    localCinfo[i].size = 1L;
+  }
+} // distSumVertexDegree
+
+GraphWeight distCalcConstantForSecondTerm(const std::vector<GraphWeight> &vDegree, MPI_Comm gcomm)
+{
+  GraphWeight totalEdgeWeightTwice = 0.0;
+  GraphWeight localWeight = 0.0;
+  int me = -1;
+
+  const size_t vsz = vDegree.size();
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(vDegree), reduction(+: localWeight) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(vDegree), firstprivate(vsz), reduction(+: localWeight) schedule(static)
+#endif  
+  for (GraphElem i = 0; i < vsz; i++)
+    localWeight += vDegree[i]; // Local reduction
+
+  // Global reduction
+  MPI_Allreduce(&localWeight, &totalEdgeWeightTwice, 1, 
+          MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+  return (1.0 / static_cast<GraphWeight>(totalEdgeWeightTwice));
+} // distCalcConstantForSecondTerm
+
+void distInitComm(std::vector<GraphElem> &pastComm, std::vector<GraphElem> &currComm, const GraphElem base)
+{
+  const size_t csz = currComm.size();
+
+#ifdef DEBUG_PRINTF  
+  assert(csz == pastComm.size());
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(pastComm, currComm), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(pastComm, currComm), firstprivate(csz, base), schedule(static)
+#endif
+  for (GraphElem i = 0L; i < csz; i++) {
+    pastComm[i] = i + base;
+    currComm[i] = i + base;
+  }
+} // distInitComm
+
+void distInitLouvain(const Graph &dg, std::vector<GraphElem> &pastComm, 
+        std::vector<GraphElem> &currComm, std::vector<GraphWeight> &vDegree, 
+        std::vector<GraphWeight> &clusterWeight, std::vector<Comm> &localCinfo, 
+        std::vector<Comm> &localCupdate, GraphWeight &constantForSecondTerm,
+        const int me)
+{
+  const GraphElem base = dg.get_base(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  vDegree.resize(nv);
+  pastComm.resize(nv);
+  currComm.resize(nv);
+  clusterWeight.resize(nv);
+  localCinfo.resize(nv);
+  localCupdate.resize(nv);
+ 
+  distSumVertexDegree(dg, vDegree, localCinfo);
+  constantForSecondTerm = distCalcConstantForSecondTerm(vDegree, gcomm);
+
+  distInitComm(pastComm, currComm, base);
+} // distInitLouvain
+
+struct clmap_t {
+  GraphElem f;
+  GraphElem s;
+};
+#define CLMAP_MAX_NUM 32
+#define COUNT_MAX_NUM 32
+
+GraphElem distGetMaxIndex(clmap_t *clmap, int &clmap_size, GraphWeight *counter, int &counter_size,
+			  const GraphWeight selfLoop, const Comm *localCinfo, 
+			  const GraphWeight vDegree, 
+                          const GraphElem currSize, const GraphWeight currDegree, const GraphElem currComm,
+			  const GraphElem base, const GraphElem bound, const GraphWeight constant)
+{
+  //std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+  clmap_t *storedAlready;
+  GraphElem maxIndex = currComm;
+  GraphWeight curGain = 0.0, maxGain = 0.0;
+  //GraphWeight eix = static_cast<GraphWeight>(counter[0]) - static_cast<GraphWeight>(selfLoop);
+  GraphWeight eix = counter[0] - selfLoop;
+
+  GraphWeight ax = currDegree - vDegree;
+  GraphWeight eiy = 0.0, ay = 0.0;
+
+  GraphElem maxSize = currSize; 
+  GraphElem size = 0;
+
+  //storedAlready = clmap.begin();
+  storedAlready = clmap;
+#ifdef DEBUG_PRINTF  
+  //assert(storedAlready != clmap.end());
+#endif
+  do {
+      //if (currComm != storedAlready->first) {
+      if (currComm != storedAlready->f) {
+
+          // is_local, direct access local info
+          //assert((storedAlready->first >= base) && (storedAlready->first < bound));
+          //ay = localCinfo[storedAlready->first-base].degree;
+          //size = localCinfo[storedAlready->first - base].size;   
+          //assert((storedAlready->f >= base) && (storedAlready->f < bound));
+          ay = localCinfo[storedAlready->f-base].degree;
+          size = localCinfo[storedAlready->f - base].size;   
+
+          //eiy = counter[storedAlready->second];
+          if (storedAlready->s < counter_size)
+            eiy = counter[storedAlready->s];
+
+          curGain = 2.0 * (eiy - eix) - 2.0 * vDegree * (ay - ax) * constant;
+
+          if ((curGain > maxGain) ||
+                  //((curGain == maxGain) && (curGain != 0.0) && (storedAlready->first < maxIndex))) {
+                  ((curGain == maxGain) && (curGain != 0.0) && (storedAlready->f < maxIndex))) {
+              maxGain = curGain;
+              //maxIndex = storedAlready->first;
+              maxIndex = storedAlready->f;
+              maxSize = size;
+          }
+      }
+      storedAlready++;
+  //} while (storedAlready != clmap.end());
+  } while (storedAlready != clmap + clmap_size);
+
+  if ((maxSize == 1) && (currSize == 1) && (maxIndex > currComm))
+    maxIndex = currComm;
+
+  return maxIndex;
+} // distGetMaxIndex
+
+GraphWeight distBuildLocalMapCounter(const GraphElem e0, const GraphElem e1, clmap_t *clmap, int &clmap_size, 
+				   GraphWeight *counter, int &counter_size, const Edge *edge_list, 
+                                   const GraphElem *currComm,
+	                           const GraphElem vertex, const GraphElem base, const GraphElem bound)
+{
+  GraphElem numUniqueClusters = 1L;
+  GraphWeight selfLoop = 0;
+  //std::unordered_map<GraphElem, GraphElem>::const_iterator storedAlready;
+  clmap_t *storedAlready;
+
+  for (GraphElem j = e0; j < e1; j++) {
+        
+    const Edge &edge = edge_list[j];
+    const GraphElem &tail_ = edge.tail_;
+    const GraphWeight &weight = edge.weight_;
+    GraphElem tcomm;
+
+    if (tail_ == vertex + base)
+      selfLoop += weight;
+
+    // is_local, direct access local std::vector<GraphElem>
+    tcomm = currComm[tail_ - base];
+
+    //storedAlready = clmap.find(tcomm);
+    storedAlready = clmap;
+    for (int i = 0; i < clmap_size; i++, storedAlready++) {
+      if (clmap[i].f == tcomm)
+        break;
+    }
+    
+    //if (storedAlready != clmap.end())
+    //  counter[storedAlready->second] += weight;
+    if (storedAlready != clmap + clmap_size && storedAlready->s < counter_size)
+      counter[storedAlready->s] += weight;
+    else {
+        //clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(tcomm, numUniqueClusters));
+        if (clmap_size < CLMAP_MAX_NUM) {
+          clmap[clmap_size].f = tcomm;
+          clmap[clmap_size].s = numUniqueClusters;
+          clmap_size++;
+        }
+        //counter.push_back(weight);
+        if (counter_size < COUNT_MAX_NUM) {
+          counter[counter_size] = weight;
+          counter_size++;
+        }
+        numUniqueClusters++;
+    }
+  }
+
+  return selfLoop;
+} // distBuildLocalMapCounter
+
+void distExecuteLouvainIteration(const GraphElem i, const GraphElem *edge_indices,
+                                 const GraphElem *parts, const Edge *edge_list,
+                                 const GraphElem *currComm,
+				 GraphElem *targetComm, const GraphWeight *vDegree,
+                                 Comm *localCinfo, Comm *localCupdate,
+                                 const GraphWeight constantForSecondTerm,
+                                 GraphWeight *clusterWeight, const int me)
+{
+  GraphElem localTarget = -1;
+  GraphElem e0, e1, selfLoop = 0;
+  //std::unordered_map<GraphElem, GraphElem> clmap;
+  clmap_t clmap[CLMAP_MAX_NUM];
+  int clmap_size = 0;
+  //std::vector<GraphWeight> counter;
+  GraphWeight counter[COUNT_MAX_NUM];
+  int counter_size = 0;
+
+  const GraphElem base = parts[me], bound = parts[me+1];
+  const GraphElem cc = currComm[i];
+  GraphWeight ccDegree;
+  GraphElem ccSize;  
+  bool currCommIsLocal = false; 
+  bool targetCommIsLocal = false;
+
+  // Current Community is local
+#ifdef DEBUG_PRINTF  
+  assert(cc >= base && cc < bound);
+#endif
+  ccDegree=localCinfo[cc-base].degree;
+  ccSize=localCinfo[cc-base].size;
+  currCommIsLocal=true;
+
+  e0 = edge_indices[i];
+  e1 = edge_indices[i+1];
+
+  if (e0 != e1) {
+    //clmap.insert(std::unordered_map<GraphElem, GraphElem>::value_type(cc, 0));
+    clmap[0].f = cc;
+    clmap[0].s = 0;
+    clmap_size++;
+    //counter.push_back(0.0);
+    counter[0] = 0.0;
+    counter_size++;
+
+    selfLoop =  distBuildLocalMapCounter(e0, e1, clmap, clmap_size, counter, counter_size, edge_list, 
+                    currComm, i, base, bound);
+
+    clusterWeight[i] += counter[0];
+
+    localTarget = distGetMaxIndex(clmap, clmap_size, counter, counter_size, selfLoop, localCinfo,
+                    vDegree[i], ccSize, ccDegree, cc, base, bound, constantForSecondTerm);
+  }
+  else
+    localTarget = cc;
+
+   // is the Target Local?
+   //assert(localTarget >= base && localTarget < bound);
+   targetCommIsLocal = true;
+  
+  // current and target comm are local - atomic updates to vectors
+  if ((localTarget != cc) && (localTarget != -1) && currCommIsLocal && targetCommIsLocal) {
+        
+#ifdef DEBUG_PRINTF  
+        assert( base < localTarget < bound);
+        assert( base < cc < bound);
+	//assert( cc - base < localCupdate.size()); 	
+	//assert( localTarget - base < localCupdate.size()); 	
+#endif
+        #pragma omp atomic update
+        localCupdate[localTarget-base].degree += vDegree[i];
+        #pragma omp atomic update
+        localCupdate[localTarget-base].size++;
+        #pragma omp atomic update
+        localCupdate[cc-base].degree -= vDegree[i];
+        #pragma omp atomic update
+        localCupdate[cc-base].size--;
+     }	
+
+#ifdef DEBUG_PRINTF  
+  assert(localTarget != -1);
+#endif
+  targetComm[i] = localTarget;
+} // distExecuteLouvainIteration
+
+GraphWeight distComputeModularity(const Graph &g, Comm *localCinfo,
+			     const GraphWeight *clusterWeight,
+			     const GraphWeight constantForSecondTerm,
+			     const int me)
+{
+  const GraphElem nv = g.get_lnv();
+  MPI_Comm gcomm = g.get_comm();
+
+  GraphWeight le_la_xx[2];
+  GraphWeight e_a_xx[2] = {0.0, 0.0};
+  GraphWeight le_xx = 0.0, la2_x = 0.0;
+
+#ifdef DEBUG_PRINTF  
+  //assert((clusterWeight.size() == nv));
+#endif
+
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for map(to: clusterWeight[0:nv], localCinfo[0:nv]) reduction(+: le_xx), reduction(+: la2_x)
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(clusterWeight, localCinfo), \
+  reduction(+: le_xx), reduction(+: la2_x) schedule(static)
+#endif
+  for (GraphElem i = 0L; i < nv; i++) {
+    le_xx += clusterWeight[i];
+    //la2_x += static_cast<GraphWeight>(localCinfo[i].degree) * static_cast<GraphWeight>(localCinfo[i].degree); 
+    la2_x += localCinfo[i].degree * localCinfo[i].degree; 
+  } 
+  le_la_xx[0] = le_xx;
+  le_la_xx[1] = la2_x;
+
+#ifdef DEBUG_PRINTF  
+  const double t0 = MPI_Wtime();
+#endif
+
+  MPI_Allreduce(le_la_xx, e_a_xx, 2, MPI_WEIGHT_TYPE, MPI_SUM, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+#endif
+
+  GraphWeight currMod = (e_a_xx[0] * constantForSecondTerm) - 
+      (e_a_xx[1] * constantForSecondTerm * constantForSecondTerm);
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]le_xx: " << le_xx << ", la2_x: " << la2_x << std::endl;
+  std::cout << "[" << me << "]e_xx: " << e_a_xx[0] << ", a2_x: " << e_a_xx[1] << ", currMod: " << currMod << std::endl;
+  std::cout << "[" << me << "]Reduction time: " << (t1 - t0) << std::endl;
+#endif
+
+  return currMod;
+} // distComputeModularity
+
+void distUpdateLocalCinfo(const GraphElem nv, Comm *localCinfo, const Comm *localCupdate)
+{
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for map(to                        \
+                                                     : localCupdate [0:nv])    \
+    map(tofrom                                                                 \
+        : localCinfo [0:nv])
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < nv; i++) {
+        localCinfo[i].size += localCupdate[i].size;
+        localCinfo[i].degree += localCupdate[i].degree;
+    }
+}
+
+void distCleanCWandCU(const GraphElem nv, GraphWeight *clusterWeight,
+        Comm *localCupdate)
+{
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for map(from                      \
+                                                     : clusterWeight [0:nv],   \
+                                                       localCupdate [0:nv])
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(static)
+#endif
+    for (GraphElem i = 0L; i < nv; i++) {
+        clusterWeight[i] = 0;
+        localCupdate[i].degree = 0;
+        localCupdate[i].size = 0;
+    }
+} // distCleanCWandCU
+
+#if defined(USE_MPI_RMA)
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate, 
+        const MPI_Win &commwin, const std::vector<MPI_Aint> &disp)
+#else
+void fillRemoteCommunities(const Graph &dg, const int me, const int nprocs,
+        const size_t &ssz, const size_t &rsz, const std::vector<GraphElem> &ssizes, 
+        const std::vector<GraphElem> &rsizes, const std::vector<GraphElem> &svdata, 
+        const std::vector<GraphElem> &rvdata, const std::vector<GraphElem> &currComm, 
+        const std::vector<Comm> &localCinfo, std::map<GraphElem,Comm> &remoteCinfo, 
+        std::unordered_map<GraphElem, GraphElem> &remoteComm, std::map<GraphElem,Comm> &remoteCupdate)
+#endif
+{
+#if defined(USE_MPI_RMA)
+    std::vector<GraphElem> scdata(ssz);
+#else
+    std::vector<GraphElem> rcdata(rsz), scdata(ssz);
+#endif
+  GraphElem spos, rpos;
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector< std::vector< GraphElem > > rcinfo(nprocs);
+#else
+  std::vector<std::unordered_set<GraphElem> > rcinfo(nprocs);
+#endif
+
+#if defined(USE_MPI_SENDRECV)
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1, ta = 0.0;
+#endif
+
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  // Collects Communities of local vertices for remote nodes
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(runtime)
+#else
+#pragma omp parallel for shared(svdata, scdata, currComm) schedule(static)
+#endif
+  for (GraphElem i = 0; i < ssz; i++) {
+    const GraphElem vertex = svdata[i];
+#ifdef DEBUG_PRINTF  
+    assert((vertex >= base) && (vertex < bound));
+#endif
+    const GraphElem comm = currComm[vertex - base];
+    scdata[i] = comm;
+  }
+
+  std::vector<GraphElem> rcsizes(nprocs), scsizes(nprocs);
+  std::vector<CommInfo> sinfo, rinfo;
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      scnts[i] = ssizes[i];
+      rcnts[i] = rsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#elif defined(USE_MPI_RMA)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_ACCUMULATE)
+          MPI_Accumulate(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, MPI_REPLACE, commwin);
+#else
+          MPI_Put(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+                  disp[i], ssizes[i], MPI_GRAPH_TYPE, commwin);
+#endif
+      }
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#elif defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      spos += ssizes[i];
+      rpos += rsizes[i];
+  }
+#else
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rcdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    rpos += rsizes[i];
+  }
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(scdata.data() + spos, ssizes[i], MPI_GRAPH_TYPE, i, 
+              CommunityTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+
+    spos += ssizes[i];
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  // reserve vectors
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  for (GraphElem i = 0; i < nprocs; i++) {
+      rcinfo[i].reserve(rpos);
+  }
+#endif
+
+  // fetch baseptr from MPI window
+#if defined(USE_MPI_RMA)
+  MPI_Win_flush_all(commwin);
+  MPI_Barrier(gcomm);
+
+  GraphElem *rcbuf = nullptr;
+  int flag = 0;
+  MPI_Win_get_attr(commwin, MPI_WIN_BASE, &rcbuf, &flag);
+#endif
+
+  remoteComm.clear();
+  for (GraphElem i = 0; i < rpos; i++) {
+
+#if defined(USE_MPI_RMA)
+    const GraphElem comm = rcbuf[i];
+#else
+    const GraphElem comm = rcdata[i];
+#endif
+
+    remoteComm.insert(std::unordered_map<GraphElem, GraphElem>::value_type(rvdata[i], comm));
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+  for (GraphElem i = 0; i < nv; i++) {
+    const GraphElem comm = currComm[i];
+    const int tproc = dg.get_owner(comm);
+
+    if (tproc != me)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+      rcinfo[tproc].emplace_back(comm);
+#else
+      rcinfo[tproc].insert(comm);
+#endif
+  }
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  GraphElem stcsz = 0, rtcsz = 0;
+  
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(scsizes, rcinfo) \
+  reduction(+:stcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    scsizes[i] = rcinfo[i].size();
+    stcsz += scsizes[i];
+  }
+
+  MPI_Alltoall(scsizes.data(), 1, MPI_GRAPH_TYPE, rcsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(runtime)
+#else
+#pragma omp parallel for shared(rcsizes) \
+  reduction(+:rtcsz) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rtcsz += rcsizes[i];
+  }
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total communities to receive: " << rtcsz << std::endl;
+#endif
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#else
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+  std::vector<GraphElem> rcomms(rtcsz);
+#else
+  std::vector<GraphElem> rcomms(rtcsz), scomms(stcsz);
+#endif
+#endif
+  sinfo.resize(rtcsz);
+  rinfo.resize(stcsz);
+
+#ifdef DEBUG_PRINTF  
+  t0 = MPI_Wtime();
+#endif
+  spos = 0;
+  rpos = 0;
+#if defined(USE_MPI_COLLECTIVES)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+      }
+      scnts[i] = scsizes[i];
+      rcnts[i] = rcsizes[i];
+      sdispls[i] = spos;
+      rdispls[i] = rpos;
+      spos += scnts[i];
+      rpos += rcnts[i];
+  }
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(scomms.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rcomms.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(runtime) /*, if(rcsizes[i] >= 1000) */
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo, rdispls), \
+          firstprivate(i), schedule(guided) /*, if(rcsizes[i] >= 1000) */
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rdispls[i] + j];
+              sinfo[rdispls[i] + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+      }
+  }
+  
+  MPI_Alltoallv(sinfo.data(), rcnts.data(), rdispls.data(), 
+          commType, rinfo.data(), scnts.data(), sdispls.data(), 
+          commType, gcomm);
+#else
+#if !defined(USE_MPI_SENDRECV)
+  std::vector<MPI_Request> rcreqs(nprocs);
+#endif
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Sendrecv(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Sendrecv(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, CommunityTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#endif
+#else
+          MPI_Irecv(rcomms.data() + rpos, rcsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &rreqs[i]);
+#if defined(REPLACE_STL_UOSET_WITH_VECTOR)
+          MPI_Isend(rcinfo[i].data(), scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#else
+          std::copy(rcinfo[i].begin(), rcinfo[i].end(), scomms.data() + spos);
+          MPI_Isend(scomms.data() + spos, scsizes[i], MPI_GRAPH_TYPE, i, 
+                  CommunityTag, gcomm, &sreqs[i]);
+#endif
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+  spos = 0;
+  rpos = 0;
+          
+  // poke progress on last isend/irecvs
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  int tf = 0, id = 0;
+  MPI_Testany(nprocs, sreqs.data(), &id, &tf, MPI_STATUS_IGNORE);
+#endif
+
+#if !defined(USE_MPI_COLLECTIVES) && !defined(USE_MPI_SENDRECV) && !defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me) {
+#if defined(USE_MPI_SENDRECV)
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+          
+          MPI_Sendrecv(sinfo.data() + rpos, rcsizes[i], commType, i, CommunityDataTag, 
+                  rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+#else
+          MPI_Irecv(rinfo.data() + spos, scsizes[i], commType, i, CommunityDataTag, 
+                  gcomm, &rcreqs[i]);
+
+          // poke progress on last isend/irecvs
+#if defined(POKE_PROGRESS_FOR_COMMUNITY_SENDRECV_IN_LOOP)
+          int flag = 0, done = 0;
+          while (!done) {
+              MPI_Test(&sreqs[i], &flag, MPI_STATUS_IGNORE);
+              MPI_Test(&rreqs[i], &flag, MPI_STATUS_IGNORE);
+              if (flag) 
+                  done = 1;
+          }
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos), schedule(runtime)
+#else
+#pragma omp parallel for default(none), shared(rcsizes, rcomms, localCinfo, sinfo), \
+          firstprivate(i, rpos, base), schedule(guided)
+#endif
+          for (GraphElem j = 0; j < rcsizes[i]; j++) {
+              const GraphElem comm = rcomms[rpos + j];
+              sinfo[rpos + j] = {comm, localCinfo[comm-base].size, localCinfo[comm-base].degree};
+          }
+
+          MPI_Isend(sinfo.data() + rpos, rcsizes[i], commType, i, 
+                  CommunityDataTag, gcomm, &sreqs[i]);
+#endif
+      }
+      else {
+#if !defined(USE_MPI_SENDRECV)
+          rcreqs[i] = MPI_REQUEST_NULL;
+          sreqs[i] = MPI_REQUEST_NULL;
+#endif
+      }
+      rpos += rcsizes[i];
+      spos += scsizes[i];
+  }
+
+#if !defined(USE_MPI_SENDRECV)
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rcreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  ta += (t1 - t0);
+#endif
+
+  remoteCinfo.clear();
+  remoteCupdate.clear();
+
+  for (GraphElem i = 0; i < stcsz; i++) {
+      const GraphElem ccomm = rinfo[i].community;
+
+      Comm comm;
+
+      comm.size = rinfo[i].size;
+      comm.degree = rinfo[i].degree;
+
+      remoteCinfo.insert(std::map<GraphElem,Comm>::value_type(ccomm, comm));
+      remoteCupdate.insert(std::map<GraphElem,Comm>::value_type(ccomm, Comm()));
+  }
+} // end fillRemoteCommunities
+
+void createCommunityMPIType()
+{
+  CommInfo cinfo;
+
+  MPI_Aint begin, community, size, degree;
+
+  MPI_Get_address(&cinfo, &begin);
+  MPI_Get_address(&cinfo.community, &community);
+  MPI_Get_address(&cinfo.size, &size);
+  MPI_Get_address(&cinfo.degree, &degree);
+
+  int blens[] = { 1, 1, 1 };
+  MPI_Aint displ[] = { community - begin, size - begin, degree - begin };
+  MPI_Datatype types[] = { MPI_GRAPH_TYPE, MPI_GRAPH_TYPE, MPI_WEIGHT_TYPE };
+
+  MPI_Type_create_struct(3, blens, displ, types, &commType);
+  MPI_Type_commit(&commType);
+} // createCommunityMPIType
+
+void destroyCommunityMPIType()
+{
+  MPI_Type_free(&commType);
+} // destroyCommunityMPIType
+
+void updateRemoteCommunities(const Graph &dg, std::vector<Comm> &localCinfo,
+			     const std::map<GraphElem,Comm> &remoteCupdate,
+			     const int me, const int nprocs)
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  std::vector<std::vector<CommInfo>> remoteArray(nprocs);
+  MPI_Comm gcomm = dg.get_comm();
+  
+  // FIXME TODO can we use TBB::concurrent_vector instead,
+  // to make this parallel; first we have to get rid of maps
+  for (std::map<GraphElem,Comm>::const_iterator iter = remoteCupdate.begin(); iter != remoteCupdate.end(); iter++) {
+      const GraphElem i = iter->first;
+      const Comm &curr = iter->second;
+
+      const int tproc = dg.get_owner(i);
+
+#ifdef DEBUG_PRINTF  
+      assert(tproc != me);
+#endif
+      CommInfo rcinfo;
+
+      rcinfo.community = i;
+      rcinfo.size = curr.size;
+      rcinfo.degree = curr.degree;
+
+      remoteArray[tproc].push_back(rcinfo);
+  }
+
+  std::vector<GraphElem> send_sz(nprocs), recv_sz(nprocs);
+
+#ifdef DEBUG_PRINTF  
+  GraphWeight tc = 0.0;
+  const double t0 = MPI_Wtime();
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for schedule(runtime)
+#else
+#pragma omp parallel for schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    send_sz[i] = remoteArray[i].size();
+  }
+
+  MPI_Alltoall(send_sz.data(), 1, MPI_GRAPH_TYPE, recv_sz.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+#ifdef DEBUG_PRINTF  
+  const double t1 = MPI_Wtime();
+  tc += (t1 - t0);
+#endif
+
+  GraphElem rcnt = 0, scnt = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(runtime)
+#else
+#pragma omp parallel for shared(recv_sz, send_sz) \
+  reduction(+:rcnt, scnt) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++) {
+    rcnt += recv_sz[i];
+    scnt += send_sz[i];
+  }
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]Total number of remote communities to update: " << scnt << std::endl;
+#endif
+
+  GraphElem currPos = 0;
+  std::vector<CommInfo> rdata(rcnt);
+
+#ifdef DEBUG_PRINTF  
+  const double t2 = MPI_Wtime();
+#endif
+#if defined(USE_MPI_SENDRECV)
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Sendrecv(remoteArray[i].data(), send_sz[i], commType, i, CommunityDataTag, 
+                  rdata.data() + currPos, recv_sz[i], commType, i, CommunityDataTag, 
+                  gcomm, MPI_STATUSES_IGNORE);
+
+      currPos += recv_sz[i];
+  }
+#else
+  std::vector<MPI_Request> sreqs(nprocs), rreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Irecv(rdata.data() + currPos, recv_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &rreqs[i]);
+    else
+      rreqs[i] = MPI_REQUEST_NULL;
+
+    currPos += recv_sz[i];
+  }
+
+  for (int i = 0; i < nprocs; i++) {
+    if (i != me)
+      MPI_Isend(remoteArray[i].data(), send_sz[i], commType, i, 
+              CommunityDataTag, gcomm, &sreqs[i]);
+    else
+      sreqs[i] = MPI_REQUEST_NULL;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+#ifdef DEBUG_PRINTF  
+  const double t3 = MPI_Wtime();
+  std::cout << "[" << me << "]Update remote community MPI time: " << (t3 - t2) << std::endl;
+#endif
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rdata, localCinfo) schedule(runtime)
+#else
+#pragma omp parallel for shared(rdata, localCinfo) schedule(dynamic)
+#endif
+  for (GraphElem i = 0; i < rcnt; i++) {
+    const CommInfo &curr = rdata[i];
+
+#ifdef DEBUG_PRINTF  
+    assert(dg.get_owner(curr.community) == me);
+#endif
+    localCinfo[curr.community-base].size += curr.size;
+    localCinfo[curr.community-base].degree += curr.degree;
+  }
+} // updateRemoteCommunities
+
+// initial setup before Louvain iteration begins
+#if defined(USE_MPI_RMA)
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs, MPI_Win &commwin)
+#else
+void exchangeVertexReqs(const Graph &dg, size_t &ssz, size_t &rsz,
+        std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata,
+        const int me, const int nprocs)
+#endif
+{
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+#ifdef USE_OPENMP_LOCK
+  std::vector<omp_lock_t> locks(nprocs);
+  for (int i = 0; i < nprocs; i++)
+    omp_init_lock(&locks[i]);
+#endif
+  std::vector<std::unordered_set<GraphElem>> parray(nprocs);
+
+#ifdef USE_OPENMP_LOCK
+#pragma omp parallel default(none), shared(dg, locks, parray)
+#else
+#pragma omp parallel default(none), shared(dg, parray) firstprivate(nv, me)
+#endif
+  {
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+      GraphElem e0, e1;
+
+      dg.edge_range(i, e0, e1);
+
+      for (GraphElem j = e0; j < e1; j++) {
+	const Edge &edge = dg.get_edge(j);
+	const int tproc = dg.get_owner(edge.tail_);
+
+	if (tproc != me) {
+#ifdef USE_OPENMP_LOCK
+	  omp_set_lock(&locks[tproc]);
+#else
+          lock();
+#endif
+	  parray[tproc].insert(edge.tail_);
+#ifdef USE_OPENMP_LOCK
+	  omp_unset_lock(&locks[tproc]);
+#else
+          unlock();
+#endif
+	}
+      }
+    }
+  }
+
+#ifdef USE_OPENMP_LOCK
+  for (int i = 0; i < nprocs; i++) {
+    omp_destroy_lock(&locks[i]);
+  }
+#endif
+  
+  rsizes.resize(nprocs);
+  ssizes.resize(nprocs);
+  ssz = 0, rsz = 0;
+
+  int pproc = 0;
+  // TODO FIXME parallelize this loop
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+    ssz += iter->size();
+    ssizes[pproc] = iter->size();
+    pproc++;
+  }
+
+  MPI_Alltoall(ssizes.data(), 1, MPI_GRAPH_TYPE, rsizes.data(), 
+          1, MPI_GRAPH_TYPE, gcomm);
+
+  GraphElem rsz_r = 0;
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(runtime)
+#else
+#pragma omp parallel for shared(rsizes) \
+  reduction(+:rsz_r) schedule(static)
+#endif
+  for (int i = 0; i < nprocs; i++)
+    rsz_r += rsizes[i];
+  rsz = rsz_r;
+  
+  svdata.resize(ssz);
+  rvdata.resize(rsz);
+
+  GraphElem cpos = 0, rpos = 0;
+  pproc = 0;
+
+#if defined(USE_MPI_COLLECTIVES)
+  std::vector<int> scnts(nprocs), rcnts(nprocs), sdispls(nprocs), rdispls(nprocs);
+  
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+      
+      scnts[pproc] = iter->size();
+      rcnts[pproc] = rsizes[pproc];
+      sdispls[pproc] = cpos;
+      rdispls[pproc] = rpos;
+      cpos += iter->size();
+      rpos += rcnts[pproc];
+
+      pproc++;
+  }
+
+  scnts[me] = 0;
+  rcnts[me] = 0;
+  MPI_Alltoallv(svdata.data(), scnts.data(), sdispls.data(), 
+          MPI_GRAPH_TYPE, rvdata.data(), rcnts.data(), rdispls.data(), 
+          MPI_GRAPH_TYPE, gcomm);
+#else
+  std::vector<MPI_Request> rreqs(nprocs), sreqs(nprocs);
+  for (int i = 0; i < nprocs; i++) {
+      if (i != me)
+          MPI_Irecv(rvdata.data() + rpos, rsizes[i], MPI_GRAPH_TYPE, i, 
+                  VertexTag, gcomm, &rreqs[i]);
+      else
+          rreqs[i] = MPI_REQUEST_NULL;
+
+      rpos += rsizes[i];
+  }
+
+  for (std::vector<std::unordered_set<GraphElem>>::const_iterator iter = parray.begin(); iter != parray.end(); iter++) {
+      std::copy(iter->begin(), iter->end(), svdata.begin() + cpos);
+
+      if (me != pproc)
+          MPI_Isend(svdata.data() + cpos, iter->size(), MPI_GRAPH_TYPE, pproc, 
+                  VertexTag, gcomm, &sreqs[pproc]);
+      else
+          sreqs[pproc] = MPI_REQUEST_NULL;
+
+      cpos += iter->size();
+      pproc++;
+  }
+
+  MPI_Waitall(nprocs, sreqs.data(), MPI_STATUSES_IGNORE);
+  MPI_Waitall(nprocs, rreqs.data(), MPI_STATUSES_IGNORE);
+#endif
+
+  std::swap(svdata, rvdata);
+  std::swap(ssizes, rsizes);
+  std::swap(ssz, rsz);
+
+  // create MPI window for communities
+#if defined(USE_MPI_RMA)  
+  GraphElem *ptr = nullptr;
+  MPI_Info info = MPI_INFO_NULL;
+#if defined(USE_MPI_ACCUMULATE)
+  MPI_Info_create(&info);
+  MPI_Info_set(info, "accumulate_ordering", "none");
+  MPI_Info_set(info, "accumulate_ops", "same_op");
+#endif
+  MPI_Win_allocate(rsz*sizeof(GraphElem), sizeof(GraphElem), 
+          info, gcomm, &ptr, &commwin);
+  MPI_Win_lock_all(MPI_MODE_NOCHECK, commwin);
+#endif
+} // exchangeVertexReqs
+
+#if defined(USE_MPI_RMA)
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters, MPI_Win &commwin)
+#else
+GraphWeight distLouvainMethod(const int me, const int nprocs, const Graph &dg,
+        size_t &ssz, size_t &rsz, std::vector<GraphElem> &ssizes, std::vector<GraphElem> &rsizes, 
+        std::vector<GraphElem> &svdata, std::vector<GraphElem> &rvdata, const GraphWeight lower, 
+        const GraphWeight thresh, int &iters)
+#endif
+{
+  std::vector<GraphElem> pastComm, currComm, targetComm;
+  std::vector<GraphWeight> vDegree;
+  std::vector<GraphWeight> clusterWeight;
+  std::vector<Comm> localCinfo, localCupdate;
+ 
+  std::unordered_map<GraphElem, GraphElem> remoteComm;
+  std::map<GraphElem,Comm> remoteCinfo, remoteCupdate;
+  
+  const GraphElem nv = dg.get_lnv();
+  MPI_Comm gcomm = dg.get_comm();
+
+  GraphWeight constantForSecondTerm;
+  GraphWeight prevMod = lower;
+  GraphWeight currMod = -1.0;
+  int numIters = 0;
+  
+  distInitLouvain(dg, pastComm, currComm, vDegree, clusterWeight, localCinfo, 
+          localCupdate, constantForSecondTerm, me);
+  targetComm.resize(nv);
+
+#ifdef DEBUG_PRINTF  
+  std::cout << "[" << me << "]constantForSecondTerm: " << constantForSecondTerm << std::endl;
+  if (me == 0)
+      std::cout << "Threshold: " << thresh << std::endl;
+#endif
+  const GraphElem base = dg.get_base(me), bound = dg.get_bound(me);
+
+#ifdef DEBUG_PRINTF  
+  double t0, t1;
+  t0 = MPI_Wtime();
+#endif
+
+  // setup vertices and communities
+#if defined(USE_MPI_RMA)
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs, commwin);
+  
+  // store the remote displacements 
+  std::vector<MPI_Aint> disp(nprocs);
+  MPI_Exscan(ssizes.data(), (GraphElem*)disp.data(), nprocs, MPI_GRAPH_TYPE, 
+          MPI_SUM, gcomm);
+#else
+  exchangeVertexReqs(dg, ssz, rsz, ssizes, rsizes, 
+          svdata, rvdata, me, nprocs);
+#endif
+
+#ifdef DEBUG_PRINTF  
+  t1 = MPI_Wtime();
+  std::cout << "[" << me << "]Initial communication setup time before Louvain iteration (in s): " << (t1 - t0) << std::endl;
+#endif
+ 
+#ifdef OMP_GPU_ALLOC
+  GraphElem *d_edge_indices = (GraphElem *)omp_target_alloc(
+      (unsigned long long)dg.edge_indices_.size() * sizeof(GraphElem), -100);
+  memcpy(d_edge_indices, &dg.edge_indices_[0], dg.edge_indices_.size() * sizeof(GraphElem));
+  GraphElem *d_parts = (GraphElem *)omp_target_alloc(
+      (unsigned long long)dg.parts_.size() * sizeof(GraphElem), -100);
+  memcpy(d_parts, &dg.parts_[0], dg.parts_.size() * sizeof(GraphElem));
+  Edge *d_edge_list = (Edge *)omp_target_alloc(
+      (unsigned long long)dg.edge_list_.size() * sizeof(Edge), -100);
+  memcpy(d_edge_list, &dg.edge_list_[0], dg.edge_list_.size() * sizeof(Edge));
+  GraphElem *d_currComm = (GraphElem *)omp_target_alloc(
+      (unsigned long long)nv * sizeof(GraphElem), -100);
+  memcpy(d_currComm, &currComm[0], nv * sizeof(GraphElem));
+  GraphWeight *d_vDegree = (GraphWeight *)omp_target_alloc(
+      (unsigned long long)nv * sizeof(GraphWeight), -100);
+  memcpy(d_vDegree, &vDegree[0], nv * sizeof(GraphWeight));
+  GraphElem *d_targetComm = (GraphElem *)omp_target_alloc(
+      (unsigned long long)nv * sizeof(GraphElem), -100);
+  memcpy(d_targetComm, &targetComm[0], nv * sizeof(GraphElem));
+  Comm *d_localCinfo =
+      (Comm *)omp_target_alloc((unsigned long long)nv * sizeof(Comm), -100);
+  memcpy(d_localCinfo, &localCinfo[0], nv * sizeof(Comm));
+  Comm *d_localCupdate =
+      (Comm *)omp_target_alloc((unsigned long long)nv * sizeof(Comm), -100);
+  memcpy(d_localCupdate, &localCupdate[0], nv * sizeof(Comm));
+  GraphWeight *d_clusterWeight = (GraphWeight *)omp_target_alloc(
+      (unsigned long long)nv * sizeof(GraphWeight), -100);
+  memcpy(d_clusterWeight, &clusterWeight[0], nv * sizeof(GraphWeight));
+#else
+  const GraphElem *d_edge_indices = &dg.edge_indices_[0];
+  d_edge_indices = (GraphElem *)omp_target_alloc((unsigned long long)d_edge_indices, -200);
+  const GraphElem *d_parts = &dg.parts_[0];
+  d_parts = (GraphElem *)omp_target_alloc((unsigned long long)d_parts, -200);
+  const Edge *d_edge_list = &dg.edge_list_[0];
+  d_edge_list = (Edge *)omp_target_alloc((unsigned long long)d_edge_list, -200);
+  GraphElem *d_currComm = &currComm[0];
+  d_currComm = (GraphElem *)omp_target_alloc((unsigned long long)d_currComm, -200);
+  const GraphWeight *d_vDegree = &vDegree[0];
+  d_vDegree = (GraphWeight *)omp_target_alloc((unsigned long long)d_vDegree, -200);
+  GraphElem *d_targetComm = &targetComm[0];
+  d_targetComm = (GraphElem *)omp_target_alloc((unsigned long long)d_targetComm, -200);
+  Comm *d_localCinfo = &localCinfo[0];
+  d_localCinfo = (Comm *)omp_target_alloc((unsigned long long)d_localCinfo, -200);
+  Comm *d_localCupdate = &localCupdate[0];
+  d_localCupdate = (Comm *)omp_target_alloc((unsigned long long)d_localCupdate, -200);
+  GraphWeight *d_clusterWeight = &clusterWeight[0];
+  d_clusterWeight = (GraphWeight *)omp_target_alloc((unsigned long long)d_clusterWeight, -200);
+#endif
+
+  double size = sizeof(GraphElem) *
+                    (dg.edge_indices_.size() + dg.parts_.size() + 2 * nv) +
+                sizeof(Edge) * dg.edge_list_.size() +
+                sizeof(GraphWeight) * 2 * nv + sizeof(Comm) * 2 * nv;
+  double t_start = omp_get_wtime();
+
+  // start Louvain iteration
+  //while(true) {
+  while(numIters < 2) {
+#ifdef DEBUG_PRINTF  
+    double t2 = omp_get_wtime();
+    if (me == 0)
+        std::cout << "Starting Louvain iteration: " << numIters << std::endl;
+#endif
+    numIters++;
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#if defined(USE_MPI_RMA)
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate, 
+            commwin, disp);
+#else
+    fillRemoteCommunities(dg, me, nprocs, ssz, rsz, ssizes, 
+            rsizes, svdata, rvdata, currComm, localCinfo, 
+            remoteCinfo, remoteComm, remoteCupdate);
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t1 = MPI_Wtime();
+    std::cout << "[" << me << "]Remote community map size: " << remoteComm.size() << std::endl;
+    std::cout << "[" << me << "]Iteration communication time: " << (t1 - t0) << std::endl;
+#endif
+
+#ifdef DEBUG_PRINTF  
+    t0 = MPI_Wtime();
+#endif
+
+#if defined(OMP_GPU)
+#else
+#pragma omp parallel default(none), shared(clusterWeight, localCupdate, currComm, targetComm, \
+        vDegree, localCinfo, remoteComm, pastComm, dg), \
+    firstprivate(constantForSecondTerm)
+#endif
+    {
+        distCleanCWandCU(nv, d_clusterWeight, d_localCupdate);
+
+#if defined(OMP_GPU)
+#pragma omp target teams distribute parallel for map(                          \
+    to                                                                         \
+    : d_edge_indices [0:dg.edge_indices_.size()],                              \
+      d_parts [0:dg.parts_.size()], d_edge_list [0:dg.edge_list_.size()],      \
+      d_currComm [0:nv], d_vDegree [0:nv], d_localCinfo [0:nv])                \
+    map(from                                                                   \
+        : d_targetComm [0:nv])                                                 \
+        map(tofrom                                                             \
+            : d_localCupdate [0:nv], d_clusterWeight [0:nv])
+#elif defined(OMP_SCHEDULE_RUNTIME)
+#pragma omp for schedule(runtime)
+#else
+#pragma omp for schedule(guided) 
+#endif
+        for (GraphElem i = 0; i < nv; i++) {
+            distExecuteLouvainIteration(i, d_edge_indices, d_parts, d_edge_list, d_currComm, d_targetComm, d_vDegree, d_localCinfo, 
+                    d_localCupdate, constantForSecondTerm, d_clusterWeight, me);
+        }
+    }
+
+#if defined(OMP_GPU)
+#else
+#pragma omp parallel default(none), shared(localCinfo, localCupdate)
+#endif
+    {
+        distUpdateLocalCinfo(nv, d_localCinfo, d_localCupdate);
+    }
+
+    // communicate remote communities
+    updateRemoteCommunities(dg, localCinfo, remoteCupdate, me, nprocs);
+
+    // compute modularity
+    currMod = distComputeModularity(dg, d_localCinfo, d_clusterWeight, constantForSecondTerm, me);
+
+    // exit criteria
+    if (currMod - prevMod < thresh)
+        break;
+
+    prevMod = currMod;
+    if (prevMod < lower)
+        prevMod = lower;
+
+#ifdef OMP_SCHEDULE_RUNTIME
+#pragma omp parallel for default(none) \
+    shared(pastComm, d_currComm, d_targetComm) \
+    schedule(runtime)
+#else
+#pragma omp parallel for default(none) \
+    shared(pastComm, d_currComm, d_targetComm) firstprivate(nv) \
+    schedule(static)
+#endif
+    for (GraphElem i = 0; i < nv; i++) {
+        GraphElem tmp = pastComm[i];
+        pastComm[i] = d_currComm[i];
+        d_currComm[i] = d_targetComm[i];
+        d_targetComm[i] = tmp;
+    }
+  } // end of Louvain iteration
+  std::cout << "Total size: " << size / 1024 / 1024 / 1024 << std::endl;
+  std::cout << "Time: " << omp_get_wtime() - t_start << std::endl;
+
+#if defined(USE_MPI_RMA)
+  MPI_Win_unlock_all(commwin);
+  MPI_Win_free(&commwin);
+#endif  
+
+  iters = numIters;
+
+  vDegree.clear();
+  pastComm.clear();
+  currComm.clear();
+  targetComm.clear();
+  clusterWeight.clear();
+  localCinfo.clear();
+  localCupdate.clear();
+  
+  return prevMod;
+} // distLouvainMethod plain
+
+#endif // __DSPL
diff --git a/miniVite/err b/miniVite/err
new file mode 100644
index 0000000..16184dd
--- /dev/null
+++ b/miniVite/err
@@ -0,0 +1,8 @@
+mpicxx main.o -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DCHECK_NUM_EDGES -DDEBUG_PRINTF  -o miniVite
+nvlink error   : Undefined reference to '_ZdlPv' in '/tmp/main-f4854c.cubin'
+nvlink error   : Undefined reference to '_ZSt17__throw_bad_allocv' in '/tmp/main-f4854c.cubin'
+nvlink error   : Undefined reference to '_Znwm' in '/tmp/main-f4854c.cubin'
+nvlink error   : Undefined reference to '_ZNKSt8__detail20_Prime_rehash_policy14_M_need_rehashEmmm' in '/tmp/main-f4854c.cubin'
+nvlink error   : Undefined reference to '__assert_fail' in '/tmp/main-f4854c.cubin'
+nvlink error   : Undefined reference to '_ZNKSt8__detail20_Prime_rehash_policy11_M_next_bktEm' in '/tmp/main-f4854c.cubin'
+clang-9: error: nvlink command failed with exit code 255 (use -v to see invocation)
diff --git a/miniVite/graph.hpp b/miniVite/graph.hpp
new file mode 100644
index 0000000..ad8631d
--- /dev/null
+++ b/miniVite/graph.hpp
@@ -0,0 +1,1053 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************
+
+#pragma once
+#ifndef GRAPH_HPP
+#define GRAPH_HPP
+
+#include <iostream>
+#include <algorithm>
+#include <vector>
+#include <string>
+#include <fstream>
+#include <sstream>
+#include <climits>
+#include <array>
+#include <unordered_map>
+
+#include <mpi.h>
+
+#include "utils.hpp"
+
+unsigned seed;
+
+struct Edge
+{
+    GraphElem tail_;
+    GraphWeight weight_;
+    
+    Edge(): tail_(-1), weight_(0.0) {}
+};
+
+struct EdgeTuple
+{
+    GraphElem ij_[2];
+    GraphWeight w_;
+
+    EdgeTuple(GraphElem i, GraphElem j, GraphWeight w): 
+        ij_{i, j}, w_(w)
+    {}
+    EdgeTuple(GraphElem i, GraphElem j): 
+        ij_{i, j}, w_(1.0) 
+    {}
+    EdgeTuple(): 
+        ij_{-1, -1}, w_(0.0)
+    {}
+};
+
+// per process graph instance
+class Graph
+{
+    public:
+        Graph(): 
+            lnv_(-1), lne_(-1), nv_(-1), 
+            ne_(-1), comm_(MPI_COMM_WORLD) 
+        {
+            MPI_Comm_size(comm_, &size_);
+            MPI_Comm_rank(comm_, &rank_);
+        }
+        
+        Graph(GraphElem lnv, GraphElem lne, 
+                GraphElem nv, GraphElem ne, 
+                MPI_Comm comm=MPI_COMM_WORLD): 
+            lnv_(lnv), lne_(lne), 
+            nv_(nv), ne_(ne), 
+            comm_(comm) 
+        {
+            MPI_Comm_size(comm_, &size_);
+            MPI_Comm_rank(comm_, &rank_);
+
+            edge_indices_.resize(lnv_+1, 0);
+            edge_list_.resize(lne_); // this is usually populated later
+
+            parts_.resize(size_+1);
+            parts_[0] = 0;
+
+            for (GraphElem i = 1; i < size_+1; i++)
+                parts_[i]=((nv_ * i) / size_);  
+        }
+
+        ~Graph() 
+        {
+            edge_list_.clear();
+            edge_indices_.clear();
+            parts_.clear();
+        }
+       
+        // TODO FIXME put asserts like the following
+        // everywhere function member of Graph class
+        void set_edge_index(GraphElem const vertex, GraphElem const e0)
+        {
+#if defined(DEBUG_BUILD)
+            assert((vertex >= 0) && (vertex <= lnv_));
+            assert((e0 >= 0) && (e0 <= lne_));
+            edge_indices_.at(vertex) = e0;
+#else
+            edge_indices_[vertex] = e0;
+#endif
+        } 
+        
+        void edge_range(GraphElem const vertex, GraphElem& e0, 
+                GraphElem& e1) const
+        {
+            e0 = edge_indices_[vertex];
+            e1 = edge_indices_[vertex+1];
+        } 
+
+        // collective
+        void set_nedges(GraphElem lne) 
+        { 
+            lne_ = lne; 
+            edge_list_.resize(lne_);
+
+            // compute total number of edges
+            ne_ = 0;
+            MPI_Allreduce(&lne_, &ne_, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
+        }
+
+        GraphElem get_base(const int rank) const
+        { return parts_[rank]; }
+
+        GraphElem get_bound(const int rank) const
+        { return parts_[rank+1]; }
+        
+        GraphElem get_range(const int rank) const
+        { return (parts_[rank+1] - parts_[rank] + 1); }
+
+        int get_owner(const GraphElem vertex) const
+        {
+            const std::vector<GraphElem>::const_iterator iter = 
+                std::upper_bound(parts_.begin(), parts_.end(), vertex);
+
+            return (iter - parts_.begin() - 1);
+        }
+
+        GraphElem get_lnv() const { return lnv_; }
+        GraphElem get_lne() const { return lne_; }
+        GraphElem get_nv() const { return nv_; }
+        GraphElem get_ne() const { return ne_; }
+        MPI_Comm get_comm() const { return comm_; }
+       
+        // return edge and active info
+        // ----------------------------
+       
+        Edge const& get_edge(GraphElem const index) const
+        { return edge_list_[index]; }
+         
+        Edge& set_edge(GraphElem const index)
+        { return edge_list_[index]; }       
+        
+        // local <--> global index translation
+        // -----------------------------------
+        GraphElem local_to_global(GraphElem idx)
+        { return (idx + get_base(rank_)); }
+
+        GraphElem global_to_local(GraphElem idx)
+        { return (idx - get_base(rank_)); }
+       
+        // w.r.t passed rank
+        GraphElem local_to_global(GraphElem idx, int rank)
+        { return (idx + get_base(rank)); }
+
+        GraphElem global_to_local(GraphElem idx, int rank)
+        { return (idx - get_base(rank)); }
+ 
+        // print edge list (with weights)
+        void print(bool print_weight = true) const
+        {
+            if (lne_ < MAX_PRINT_NEDGE)
+            {
+                for (int p = 0; p < size_; p++)
+                {
+                    MPI_Barrier(comm_);
+                    if (p == rank_)
+                    {
+                        std::cout << "###############" << std::endl;
+                        std::cout << "Process #" << p << ": " << std::endl;
+                        std::cout << "###############" << std::endl;
+                        GraphElem base = get_base(p);
+                        for (GraphElem i = 0; i < lnv_; i++)
+                        {
+                            GraphElem e0, e1;
+                            edge_range(i, e0, e1);
+                            if (print_weight) { // print weights (default)
+                                for (GraphElem e = e0; e < e1; e++)
+                                {
+                                    Edge const& edge = get_edge(e);
+                                    std::cout << i+base << " " << edge.tail_ << " " << edge.weight_ << std::endl;
+                                }
+                            }
+                            else { // don't print weights
+                                for (GraphElem e = e0; e < e1; e++)
+                                {
+                                    Edge const& edge = get_edge(e);
+                                    std::cout << i+base << " " << edge.tail_ << std::endl;
+                                }
+                            }
+                        }
+                        MPI_Barrier(comm_);
+                    }
+                }
+            }
+            else
+            {
+                if (rank_ == 0)
+                    std::cout << "Graph size per process is {" << lnv_ << ", " << lne_ << 
+                        "}, which will overwhelm STDOUT." << std::endl;
+            }
+        }
+       
+        // print statistics about edge distribution
+        void print_dist_stats()
+        {
+            GraphElem sumdeg = 0, maxdeg = 0;
+
+            MPI_Reduce(&lne_, &sumdeg, 1, MPI_GRAPH_TYPE, MPI_SUM, 0, comm_);
+            MPI_Reduce(&lne_, &maxdeg, 1, MPI_GRAPH_TYPE, MPI_MAX, 0, comm_);
+
+            GraphElem my_sq = lne_*lne_;
+            GraphElem sum_sq = 0;
+            MPI_Reduce(&my_sq, &sum_sq, 1, MPI_GRAPH_TYPE, MPI_SUM, 0, comm_);
+
+            GraphWeight average  = (GraphWeight) sumdeg / size_;
+            GraphWeight avg_sq   = (GraphWeight) sum_sq / size_;
+            GraphWeight var      = avg_sq - (average*average);
+            GraphWeight stddev   = sqrt(var);
+
+            MPI_Barrier(comm_);
+
+            if (rank_ == 0)
+            {
+                std::cout << std::endl;
+                std::cout << "-------------------------------------------------------" << std::endl;
+                std::cout << "Graph edge distribution characteristics" << std::endl;
+                std::cout << "-------------------------------------------------------" << std::endl;
+                std::cout << "Number of vertices: " << nv_ << std::endl;
+                std::cout << "Number of edges: " << ne_ << std::endl;
+                std::cout << "Maximum number of edges: " << maxdeg << std::endl;
+                std::cout << "Average number of edges: " << average << std::endl;
+                std::cout << "Expected value of X^2: " << avg_sq << std::endl;
+                std::cout << "Variance: " << var << std::endl;
+                std::cout << "Standard deviation: " << stddev << std::endl;
+                std::cout << "-------------------------------------------------------" << std::endl;
+
+            }
+        }
+
+        // public variables
+        std::vector<GraphElem> edge_indices_;
+        std::vector<Edge> edge_list_;
+        GraphElem lnv_, lne_, nv_, ne_;
+        std::vector<GraphElem> parts_;        
+        
+        MPI_Comm comm_; 
+        int rank_, size_;
+    private:
+};
+
+// read in binary edge list files
+// using MPI I/O
+class BinaryEdgeList
+{
+    public:
+        BinaryEdgeList() : 
+            M_(-1), N_(-1), 
+            M_local_(-1), N_local_(-1), 
+            comm_(MPI_COMM_WORLD) 
+        {}
+        BinaryEdgeList(MPI_Comm comm) : 
+            M_(-1), N_(-1), 
+            M_local_(-1), N_local_(-1), 
+            comm_(comm) 
+        {}
+
+        // the input binary file will be sorted by
+        // vertices
+        // read a file and return a graph
+        Graph* read(int me, int nprocs, int ranks_per_node, std::string file)
+        {
+            int file_open_error;
+            MPI_File fh;
+            MPI_Status status;
+
+            // specify the number of aggregates
+            MPI_Info info;
+            MPI_Info_create(&info);
+            int naggr = (ranks_per_node > 1) ? (nprocs/ranks_per_node) : ranks_per_node;
+            if (naggr >= nprocs)
+                naggr = 1;
+            std::stringstream tmp_str;
+            tmp_str << naggr;
+            std::string str = tmp_str.str();
+            MPI_Info_set(info, "cb_nodes", str.c_str());
+
+            file_open_error = MPI_File_open(comm_, file.c_str(), MPI_MODE_RDONLY, info, &fh); 
+            MPI_Info_free(&info);
+
+            if (file_open_error != MPI_SUCCESS) 
+            {
+                std::cout << " Error opening file! " << std::endl;
+                MPI_Abort(comm_, -99);
+            }
+
+            // read the dimensions 
+            MPI_File_read_all(fh, &M_, sizeof(GraphElem), MPI_BYTE, &status);
+            MPI_File_read_all(fh, &N_, sizeof(GraphElem), MPI_BYTE, &status);
+            M_local_ = ((M_*(me + 1)) / nprocs) - ((M_*me) / nprocs); 
+
+            // create local graph
+            Graph *g = new Graph(M_local_, 0, M_, N_);
+
+            // Let N = array length and P = number of processors.
+            // From j = 0 to P-1,
+            // Starting point of array on processor j = floor(N * j / P)
+            // Length of array on processor j = floor(N * (j + 1) / P) - floor(N * j / P)
+
+            uint64_t tot_bytes=(M_local_+1)*sizeof(GraphElem);
+            MPI_Offset offset = 2*sizeof(GraphElem) + ((M_*me) / nprocs)*sizeof(GraphElem);
+
+            // read in INT_MAX increments if total byte size is > INT_MAX
+            
+            if (tot_bytes < INT_MAX)
+                MPI_File_read_at(fh, offset, &g->edge_indices_[0], tot_bytes, MPI_BYTE, &status);
+            else 
+            {
+                int chunk_bytes=INT_MAX;
+                uint8_t *curr_pointer = (uint8_t*) &g->edge_indices_[0];
+                uint64_t transf_bytes = 0;
+
+                while (transf_bytes < tot_bytes)
+                {
+                    MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
+                    transf_bytes += chunk_bytes;
+                    offset += chunk_bytes;
+                    curr_pointer += chunk_bytes;
+
+                    if ((tot_bytes - transf_bytes) < INT_MAX)
+                        chunk_bytes = tot_bytes - transf_bytes;
+                } 
+            }    
+
+            N_local_ = g->edge_indices_[M_local_] - g->edge_indices_[0];
+            g->set_nedges(N_local_);
+
+            tot_bytes = N_local_*(sizeof(Edge));
+            offset = 2*sizeof(GraphElem) + (M_+1)*sizeof(GraphElem) + g->edge_indices_[0]*(sizeof(Edge));
+
+            if (tot_bytes < INT_MAX)
+                MPI_File_read_at(fh, offset, &g->edge_list_[0], tot_bytes, MPI_BYTE, &status);
+            else 
+            {
+                int chunk_bytes=INT_MAX;
+                uint8_t *curr_pointer = (uint8_t*)&g->edge_list_[0];
+                uint64_t transf_bytes = 0;
+
+                while (transf_bytes < tot_bytes)
+                {
+                    MPI_File_read_at(fh, offset, curr_pointer, chunk_bytes, MPI_BYTE, &status);
+                    transf_bytes += chunk_bytes;
+                    offset += chunk_bytes;
+                    curr_pointer += chunk_bytes;
+
+                    if ((tot_bytes - transf_bytes) < INT_MAX)
+                        chunk_bytes = (tot_bytes - transf_bytes);
+                } 
+            }    
+
+            MPI_File_close(&fh);
+
+            for(GraphElem i=1;  i < M_local_+1; i++)
+                g->edge_indices_[i] -= g->edge_indices_[0];   
+            g->edge_indices_[0] = 0;
+
+            return g;
+        }
+    private:
+        GraphElem M_;
+        GraphElem N_;
+        GraphElem M_local_;
+        GraphElem N_local_;
+        MPI_Comm comm_;
+};
+
+// RGG graph
+// 1D vertex distribution
+class GenerateRGG
+{
+    public:
+        GenerateRGG(GraphElem nv, MPI_Comm comm = MPI_COMM_WORLD)
+        {
+            nv_ = nv;
+            comm_ = comm;
+
+            MPI_Comm_rank(comm_, &rank_);
+            MPI_Comm_size(comm_, &nprocs_);
+
+            // neighbors
+            up_ = down_ = MPI_PROC_NULL;
+            if (nprocs_ > 1) {
+                if (rank_ > 0 && rank_ < (nprocs_ - 1)) {
+                    up_ = rank_ - 1;
+                    down_ = rank_ + 1;
+                }
+                if (rank_ == 0)
+                    down_ = 1;
+                if (rank_ == (nprocs_ - 1))
+                    up_ = rank_ - 1;
+            }
+
+            n_ = nv_ / nprocs_;
+
+            // check if number of nodes is divisible by #processes
+            if ((nv_ % nprocs_) != 0) {
+                if (rank_ == 0) {
+                    std::cout << "[ERROR] Number of vertices must be perfectly divisible by number of processes." << std::endl;
+                    std::cout << "Exiting..." << std::endl;
+                }
+                MPI_Abort(comm_, -99);
+            }
+
+            // check if processes are power of 2
+            if (!is_pwr2(nprocs_)) {
+                if (rank_ == 0) {
+                    std::cout << "[ERROR] Number of processes must be a power of 2." << std::endl;
+                    std::cout << "Exiting..." << std::endl;
+                }
+                MPI_Abort(comm_, -99);
+            }
+
+            // calculate r(n)
+            GraphWeight rc = sqrt((GraphWeight)log(nv)/(GraphWeight)(PI*nv));
+            GraphWeight rt = sqrt((GraphWeight)2.0736/(GraphWeight)nv);
+            rn_ = (rc + rt)/(GraphWeight)2.0;
+            
+            assert(((GraphWeight)1.0/(GraphWeight)nprocs_) > rn_);
+            
+            MPI_Barrier(comm_);
+        }
+
+        // create RGG and returns Graph
+        // TODO FIXME use OpenMP wherever possible
+        // use Euclidean distance as edge weight
+        // for random edges, choose from (0,1)
+        // otherwise, use unit weight throughout
+        Graph* generate(bool isLCG, bool unitEdgeWeight = true, int randomEdgePercent = 0)
+        {
+            // Generate random coordinate points
+            std::vector<GraphWeight> X, Y, X_up, Y_up, X_down, Y_down;
+                       
+            if (isLCG)
+                X.resize(2*n_);
+            else
+                X.resize(n_);
+
+            Y.resize(n_);
+
+            if (up_ != MPI_PROC_NULL) {
+                X_up.resize(n_);
+                Y_up.resize(n_);
+            }
+
+            if (down_ != MPI_PROC_NULL) {
+                X_down.resize(n_);
+                Y_down.resize(n_);
+            }
+    
+            // create local graph
+            Graph *g = new Graph(n_, 0, nv_, nv_);
+
+            // generate random number within range
+            // X: 0, 1
+            // Y: rank_*1/p, (rank_+1)*1/p,
+            GraphWeight rec_np = (GraphWeight)(1.0/(GraphWeight)nprocs_);
+            GraphWeight lo = rank_* rec_np; 
+            GraphWeight hi = lo + rec_np;
+            assert(hi > lo);
+
+            // measure the time to generate random numbers
+            MPI_Barrier(MPI_COMM_WORLD);
+            double st = MPI_Wtime();
+
+            if (!isLCG) {
+                // set seed (declared an extern in utils)
+                seed = (unsigned)reseeder(1);
+
+#if defined(PRINT_RANDOM_XY_COORD)
+                for (int k = 0; k < nprocs_; k++) {
+                    if (k == rank_) {
+                        std::cout << "Random number generated on Process#" << k << " :" << std::endl;
+                        for (GraphElem i = 0; i < n_; i++) {
+                            X[i] = genRandom<GraphWeight>(0.0, 1.0);
+                            Y[i] = genRandom<GraphWeight>(lo, hi);
+                            std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl;
+                        }
+                    }
+                    MPI_Barrier(comm_);
+                }
+#else
+                for (GraphElem i = 0; i < n_; i++) {
+                    //X[i] = genRandom<GraphWeight>(0.0, 1.0);
+                    //Y[i] = genRandom<GraphWeight>(lo, hi);
+                    X[i] = 1.0 * i / n_;
+                    Y[i] = lo + (hi - lo) * i / n_;
+                }
+#endif
+            }
+            else { // LCG
+                // X | Y
+                // e.g seeds: 1741, 3821
+                // create LCG object
+                // seed to generate x0
+                LCG xr(/*seed*/1, X.data(), 2*n_, comm_); 
+                
+                // generate random numbers between 0-1
+                xr.generate();
+
+                // rescale xr further between lo-hi
+                // and put the numbers in Y taking
+                // from X[n]
+                xr.rescale(Y.data(), n_, lo);
+
+#if defined(PRINT_RANDOM_XY_COORD)
+                for (int k = 0; k < nprocs_; k++) {
+                    if (k == rank_) {
+                        std::cout << "Random number generated on Process#" << k << " :" << std::endl;
+                        for (GraphElem i = 0; i < n_; i++) {
+                            std::cout << "X, Y: " << X[i] << ", " << Y[i] << std::endl;
+                        }
+                    }
+                    MPI_Barrier(comm_);
+                }
+#endif
+            }
+                 
+            double et = MPI_Wtime();
+            double tt = et - st;
+            double tot_tt = 0.0;
+            MPI_Reduce(&tt, &tot_tt, 1, MPI_DOUBLE, MPI_SUM, 0, comm_);
+                
+            if (rank_ == 0) {
+                double tot_avg = (tot_tt/nprocs_);
+                std::cout << "Average time to generate " << 2*n_ 
+                    << " random numbers using LCG (in s): " 
+                    << tot_avg << std::endl;
+            }
+
+            // ghost(s)
+            
+            // cross edges, each processor
+            // communicates with up or/and down
+            // neighbor only
+            std::vector<EdgeTuple> sendup_edges, senddn_edges; 
+            std::vector<EdgeTuple> recvup_edges, recvdn_edges;
+            std::vector<EdgeTuple> edgeList;
+            
+            // counts, indexing: [2] = {up - 0, down - 1}
+            // TODO can't we use MPI_INT 
+            std::array<GraphElem, 2> send_sizes = {0, 0}, recv_sizes = {0, 0};
+#if defined(CHECK_NUM_EDGES)
+            GraphElem numEdges = 0;
+#endif
+            // local
+            for (GraphElem i = 0; i < n_; i++) {
+                //for (GraphElem j = i + 1; j < n_; j++) {
+                for (GraphElem j = i + 1; j < n_ && j < i + 10; j++) {
+                    // euclidean distance:
+                    // 2D: sqrt((px-qx)^2 + (py-qy)^2)
+                    GraphWeight dx = X[i] - X[j];
+                    GraphWeight dy = Y[i] - Y[j];
+                    GraphWeight ed = sqrt(dx*dx + dy*dy);
+                    // are the two vertices within the range?
+                    if (ed <= rn_) {
+                        // local to global index
+                        const GraphElem g_i = g->local_to_global(i);
+                        const GraphElem g_j = g->local_to_global(j);
+
+                        if (!unitEdgeWeight) {
+                            edgeList.emplace_back(i, g_j, ed);
+                            edgeList.emplace_back(j, g_i, ed);
+                        }
+                        else {
+                            edgeList.emplace_back(i, g_j);
+                            edgeList.emplace_back(j, g_i);
+                        }
+#if defined(CHECK_NUM_EDGES)
+                        numEdges += 2;
+#endif
+
+                        g->edge_indices_[i+1]++;
+                        g->edge_indices_[j+1]++;
+                    }
+                }
+            }
+
+            MPI_Barrier(comm_);
+            
+            // communicate ghost coordinates with neighbors
+           
+            const int x_ndown   = X_down.empty() ? 0 : n_;
+            const int y_ndown   = Y_down.empty() ? 0 : n_;
+            const int x_nup     = X_up.empty() ? 0 : n_;
+            const int y_nup     = Y_up.empty() ? 0 : n_;
+
+            MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, up_, SR_X_UP_TAG, 
+                    X_down.data(), x_ndown, MPI_WEIGHT_TYPE, down_, SR_X_UP_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+            MPI_Sendrecv(X.data(), n_, MPI_WEIGHT_TYPE, down_, SR_X_DOWN_TAG, 
+                    X_up.data(), x_nup, MPI_WEIGHT_TYPE, up_, SR_X_DOWN_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+            MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, up_, SR_Y_UP_TAG, 
+                    Y_down.data(), y_ndown, MPI_WEIGHT_TYPE, down_, SR_Y_UP_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+            MPI_Sendrecv(Y.data(), n_, MPI_WEIGHT_TYPE, down_, SR_Y_DOWN_TAG, 
+                    Y_up.data(), y_nup, MPI_WEIGHT_TYPE, up_, SR_Y_DOWN_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+                        
+            // exchange ghost vertices / cross edges
+            if (nprocs_ > 1) {
+                if (up_ != MPI_PROC_NULL) {
+                    
+                    for (GraphElem i = 0; i < n_; i++) {
+                        for (GraphElem j = i + 1; j < n_; j++) {
+                            GraphWeight dx = X[i] - X_up[j];
+                            GraphWeight dy = Y[i] - Y_up[j];
+                            GraphWeight ed = sqrt(dx*dx + dy*dy);
+                            
+                            if (ed <= rn_) {
+                                const GraphElem g_i = g->local_to_global(i);
+                                const GraphElem g_j = j + up_*n_;
+
+                                if (!unitEdgeWeight) {
+                                    sendup_edges.emplace_back(j, g_i, ed);
+                                    edgeList.emplace_back(i, g_j, ed);
+                                }
+                                else {
+                                    sendup_edges.emplace_back(j, g_i);
+                                    edgeList.emplace_back(i, g_j);
+                                }
+#if defined(CHECK_NUM_EDGES)
+                                numEdges++;
+#endif
+                                g->edge_indices_[i+1]++;
+                            }
+                        }
+                    }
+                    
+                    // send up sizes
+                    send_sizes[0] = sendup_edges.size();
+                }
+
+                if (down_ != MPI_PROC_NULL) {
+                    
+                    for (GraphElem i = 0; i < n_; i++) {
+                        for (GraphElem j = i + 1; j < n_; j++) {
+                            GraphWeight dx = X[i] - X_down[j];
+                            GraphWeight dy = Y[i] - Y_down[j];
+                            GraphWeight ed = sqrt(dx*dx + dy*dy);
+
+                            if (ed <= rn_) {
+                                const GraphElem g_i = g->local_to_global(i);
+                                const GraphElem g_j = j + down_*n_;
+
+                                if (!unitEdgeWeight) {
+                                    senddn_edges.emplace_back(j, g_i, ed);
+                                    edgeList.emplace_back(i, g_j, ed);
+                                }
+                                else {
+                                    senddn_edges.emplace_back(j, g_i);
+                                    edgeList.emplace_back(i, g_j);
+                                }
+#if defined(CHECK_NUM_EDGES)
+                                numEdges++;
+#endif
+                                g->edge_indices_[i+1]++;
+                            }
+                        }
+                    }
+                    
+                    // send down sizes
+                    send_sizes[1] = senddn_edges.size();
+                }
+            }
+            
+            MPI_Barrier(comm_);
+            
+            // communicate ghost vertices with neighbors
+            // send/recv buffer sizes
+            
+            MPI_Sendrecv(&send_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_UP_TAG, 
+                    &recv_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_UP_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+            MPI_Sendrecv(&send_sizes[1], 1, MPI_GRAPH_TYPE, down_, SR_SIZES_DOWN_TAG, 
+                    &recv_sizes[0], 1, MPI_GRAPH_TYPE, up_, SR_SIZES_DOWN_TAG, 
+                    comm_, MPI_STATUS_IGNORE);
+
+            // resize recv buffers
+            
+            if (recv_sizes[0] > 0)
+                recvup_edges.resize(recv_sizes[0]);
+            if (recv_sizes[1] > 0)
+                recvdn_edges.resize(recv_sizes[1]);
+             
+            // send/recv both up and down
+            
+            MPI_Sendrecv(sendup_edges.data(), send_sizes[0]*sizeof(struct EdgeTuple), MPI_BYTE, 
+                    up_, SR_UP_TAG, recvdn_edges.data(), recv_sizes[1]*sizeof(struct EdgeTuple), 
+                    MPI_BYTE, down_, SR_UP_TAG, comm_, MPI_STATUS_IGNORE);
+            MPI_Sendrecv(senddn_edges.data(), send_sizes[1]*sizeof(struct EdgeTuple), MPI_BYTE, 
+                    down_, SR_DOWN_TAG, recvup_edges.data(), recv_sizes[0]*sizeof(struct EdgeTuple), 
+                    MPI_BYTE, up_, SR_DOWN_TAG, comm_, MPI_STATUS_IGNORE);
+
+            // update local #edges
+            
+            // down
+            if (down_ != MPI_PROC_NULL) {
+                for (GraphElem i = 0; i < recv_sizes[1]; i++) {
+#if defined(CHECK_NUM_EDGES)
+                    numEdges++;
+#endif           
+                    if (!unitEdgeWeight)
+                        edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1], recvdn_edges[i].w_);
+                    else
+                        edgeList.emplace_back(recvdn_edges[i].ij_[0], recvdn_edges[i].ij_[1]);
+                    g->edge_indices_[recvdn_edges[i].ij_[0]+1]++; 
+                } 
+            }
+
+            // up
+            if (up_ != MPI_PROC_NULL) {
+                for (GraphElem i = 0; i < recv_sizes[0]; i++) {
+#if defined(CHECK_NUM_EDGES)
+                    numEdges++;
+#endif
+                    if (!unitEdgeWeight)
+                        edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1], recvup_edges[i].w_);
+                    else
+                        edgeList.emplace_back(recvup_edges[i].ij_[0], recvup_edges[i].ij_[1]);
+                    g->edge_indices_[recvup_edges[i].ij_[0]+1]++; 
+                }
+            }
+            
+            // add random edges based on 
+            // randomEdgePercent 
+            if (randomEdgePercent > 0) {
+                const GraphElem pnedges = (edgeList.size()/2);
+                GraphElem tot_pnedges = 0;
+
+                MPI_Allreduce(&pnedges, &tot_pnedges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
+                
+                // extra #edges per process
+                const GraphElem nrande = (((GraphElem)randomEdgePercent * tot_pnedges)/100);
+                GraphElem pnrande;
+
+                // TODO FIXME try to ensure a fair edge distibution
+                if (nrande < nprocs_) {
+                    if (rank_ == (nprocs_ - 1))
+                        pnrande += nrande;
+                }
+                else {
+                    pnrande = nrande / nprocs_;
+                    const GraphElem pnrem = nrande % nprocs_;
+                    if (pnrem != 0) {
+                        if (rank_ == (nprocs_ - 1))
+                            pnrande += pnrem;
+                    }
+                }
+               
+                // add pnrande edges 
+
+                // send/recv buffers
+                std::vector<std::vector<EdgeTuple>> rand_edges(nprocs_); 
+                std::vector<EdgeTuple> sendrand_edges, recvrand_edges;
+
+                // outgoing/incoming send/recv sizes
+                std::vector<int> sendrand_sizes(nprocs_), recvrand_sizes(nprocs_);
+
+#if defined(PRINT_EXTRA_NEDGES)
+                int extraEdges = 0;
+#endif
+
+#if defined(DEBUG_PRINTF)
+                for (int i = 0; i < nprocs_; i++) {
+                    if (i == rank_) {
+                        std::cout << "[" << i << "]Target process for random edge insertion between " 
+                            << lo << " and " << hi << std::endl;
+                    }
+                    MPI_Barrier(comm_);
+                }
+#endif
+                // make sure each process has a 
+                // different seed this time since
+                // we want random edges
+                unsigned rande_seed = (unsigned)(time(0)^getpid());
+                GraphWeight weight = 1.0;
+                std::hash<GraphElem> reh;
+               
+                // cannot use genRandom if it's already been seeded
+                std::default_random_engine re(rande_seed); 
+                std::uniform_int_distribution<> IR, JR; 
+                std::uniform_real_distribution<> IJW; 
+ 
+                for (GraphElem k = 0; k < pnrande; k++) {
+
+                    // randomly pick start/end vertex and target from my list
+                    const GraphElem i = (GraphElem)IR(re, std::uniform_int_distribution<>::param_type{0, (int)(n_- 1)});
+                    const GraphElem g_j = (GraphElem)JR(re, std::uniform_int_distribution<>::param_type{0, (int)(nv_- 1)});
+                    const int target = g->get_owner(g_j);
+                    const GraphElem j = g->global_to_local(g_j, target); // local
+
+                    if (i == j) 
+                        continue;
+
+                    const GraphElem g_i = g->local_to_global(i);
+                    
+                    // check for duplicates prior to edgeList insertion
+                    auto found = std::find_if(edgeList.begin(), edgeList.end(), 
+                            [&](EdgeTuple const& et) 
+                            { return ((et.ij_[0] == i) && (et.ij_[1] == g_j)); });
+
+                    // OK to insert, not in list
+                    if (found == std::end(edgeList)) { 
+                   
+                        // calculate weight
+                        if (!unitEdgeWeight) {
+                            if (target == rank_) {
+                                GraphWeight dx = X[i] - X[j];
+                                GraphWeight dy = Y[i] - Y[j];
+                                weight = sqrt(dx*dx + dy*dy);
+                            }
+                            else if (target == up_) {
+                                GraphWeight dx = X[i] - X_up[j];
+                                GraphWeight dy = Y[i] - Y_up[j];
+                                weight = sqrt(dx*dx + dy*dy);
+                            }
+                            else if (target == down_) {
+                                GraphWeight dx = X[i] - X_down[j];
+                                GraphWeight dy = Y[i] - Y_down[j];
+                                weight = sqrt(dx*dx + dy*dy);
+                            }
+                            else {
+                                unsigned randw_seed = reh((GraphElem)(g_i*nv_+g_j));
+                                std::default_random_engine rew(randw_seed); 
+                                weight = (GraphWeight)IJW(rew, std::uniform_real_distribution<>::param_type{0.0, 1.0});
+                            }
+                        }
+
+                        rand_edges[target].emplace_back(j, g_i, weight);
+                        sendrand_sizes[target]++;
+
+#if defined(PRINT_EXTRA_NEDGES)
+                        extraEdges++;
+#endif
+#if defined(CHECK_NUM_EDGES)
+                        numEdges++;
+#endif                       
+                        edgeList.emplace_back(i, g_j, weight);
+                        g->edge_indices_[i+1]++;
+                    }
+                }
+                
+#if defined(PRINT_EXTRA_NEDGES)
+                int totExtraEdges = 0;
+                MPI_Reduce(&extraEdges, &totExtraEdges, 1, MPI_INT, MPI_SUM, 0, comm_);
+                if (rank_ == 0)
+                    std::cout << "Adding extra " << totExtraEdges << " edges while trying to incorporate " 
+                        << randomEdgePercent << "%" << " extra edges globally." << std::endl;
+#endif
+
+                MPI_Barrier(comm_);
+              
+                // communicate ghosts edges
+                MPI_Request rande_sreq;
+
+                MPI_Ialltoall(sendrand_sizes.data(), 1, MPI_INT, 
+                        recvrand_sizes.data(), 1, MPI_INT, comm_, 
+                        &rande_sreq);
+
+                // send data if outgoing size > 0
+                for (int p = 0; p < nprocs_; p++) {
+                    sendrand_edges.insert(sendrand_edges.end(), 
+                            rand_edges[p].begin(), rand_edges[p].end());
+                }
+
+                MPI_Wait(&rande_sreq, MPI_STATUS_IGNORE);
+               
+                // total recvbuffer size
+                const int rcount = std::accumulate(recvrand_sizes.begin(), recvrand_sizes.end(), 0);
+                recvrand_edges.resize(rcount);
+                                
+                // alltoallv for incoming data
+                // TODO FIXME make sure size of extra edges is 
+                // within INT limits
+               
+                int rpos = 0, spos = 0;
+                std::vector<int> sdispls(nprocs_), rdispls(nprocs_);
+                
+                for (int p = 0; p < nprocs_; p++) {
+
+                    sendrand_sizes[p] *= sizeof(struct EdgeTuple);
+                    recvrand_sizes[p] *= sizeof(struct EdgeTuple);
+                    
+                    sdispls[p] = spos;
+                    rdispls[p] = rpos;
+                    
+                    spos += sendrand_sizes[p];
+                    rpos += recvrand_sizes[p];
+                }
+                
+                MPI_Alltoallv(sendrand_edges.data(), sendrand_sizes.data(), sdispls.data(), 
+                        MPI_BYTE, recvrand_edges.data(), recvrand_sizes.data(), rdispls.data(), 
+                        MPI_BYTE, comm_);
+                
+                // update local edge list
+                for (int i = 0; i < rcount; i++) {
+#if defined(CHECK_NUM_EDGES)
+                    numEdges++;
+#endif
+                    edgeList.emplace_back(recvrand_edges[i].ij_[0], recvrand_edges[i].ij_[1], recvrand_edges[i].w_);
+                    g->edge_indices_[recvrand_edges[i].ij_[0]+1]++; 
+                }
+
+                sendrand_edges.clear();
+                recvrand_edges.clear();
+                rand_edges.clear();
+            } // end of (conditional) random edges addition
+
+            MPI_Barrier(comm_);
+  
+            // set graph edge indices
+            
+            std::vector<GraphElem> ecTmp(n_+1);
+            std::partial_sum(g->edge_indices_.begin(), g->edge_indices_.end(), ecTmp.begin());
+            g->edge_indices_ = ecTmp;
+             
+            for(GraphElem i = 1; i < n_+1; i++)
+                g->edge_indices_[i] -= g->edge_indices_[0];   
+            g->edge_indices_[0] = 0;
+
+            g->set_edge_index(0, 0);
+            for (GraphElem i = 0; i < n_; i++)
+                g->set_edge_index(i+1, g->edge_indices_[i+1]);
+            
+            const GraphElem nedges = g->edge_indices_[n_] - g->edge_indices_[0];
+            g->set_nedges(nedges);
+            
+            // set graph edge list
+            // sort edge list
+            auto ecmp = [] (EdgeTuple const& e0, EdgeTuple const& e1)
+            { return ((e0.ij_[0] < e1.ij_[0]) || ((e0.ij_[0] == e1.ij_[0]) && (e0.ij_[1] < e1.ij_[1]))); };
+
+            if (!std::is_sorted(edgeList.begin(), edgeList.end(), ecmp)) {
+#if defined(DEBUG_PRINTF)
+                std::cout << "Edge list is not sorted." << std::endl;
+#endif
+                std::sort(edgeList.begin(), edgeList.end(), ecmp);
+            }
+#if defined(DEBUG_PRINTF)
+            else
+                std::cout << "Edge list is sorted!" << std::endl;
+#endif
+  
+            GraphElem ePos = 0;
+            for (GraphElem i = 0; i < n_; i++) {
+                GraphElem e0, e1;
+
+                g->edge_range(i, e0, e1);
+#if defined(DEBUG_PRINTF)
+                if ((i % 100000) == 0)
+                    std::cout << "Processing edges for vertex: " << i << ", range(" << e0 << ", " << e1 <<
+                        ")" << std::endl;
+#endif
+                for (GraphElem j = e0; j < e1; j++) {
+                    Edge &edge = g->set_edge(j);
+
+                    assert(ePos == j);
+                    assert(i == edgeList[ePos].ij_[0]);
+                    
+                    edge.tail_ = edgeList[ePos].ij_[1];
+                    edge.weight_ = edgeList[ePos].w_;
+
+                    ePos++;
+                }
+            }
+            
+#if defined(CHECK_NUM_EDGES)
+            GraphElem tot_numEdges = 0;
+            MPI_Allreduce(&numEdges, &tot_numEdges, 1, MPI_GRAPH_TYPE, MPI_SUM, comm_);
+            const GraphElem tne = g->get_ne();
+            assert(tne == tot_numEdges);
+#endif
+            edgeList.clear();
+            
+            X.clear();
+            Y.clear();
+            X_up.clear();
+            Y_up.clear();
+            X_down.clear();
+            Y_down.clear();
+
+            sendup_edges.clear();
+            senddn_edges.clear();
+            recvup_edges.clear();
+            recvdn_edges.clear();
+
+            return g;
+        }
+
+        GraphWeight get_d() const { return rn_; }
+        GraphElem get_nv() const { return nv_; }
+
+    private:
+        GraphElem nv_, n_;
+        GraphWeight rn_;
+        MPI_Comm comm_;
+        int nprocs_, rank_, up_, down_;
+};
+
+#endif
diff --git a/miniVite/log b/miniVite/log
new file mode 100644
index 0000000..19ee514
--- /dev/null
+++ b/miniVite/log
@@ -0,0 +1,518 @@
+My libomptarget --> Set mode to SDEV
+==168619== NVPROF is profiling process 168619, command: ./miniVite -n 50000000
+Average time to generate 100000000 random numbers using LCG (in s): 0.182072
+**********************************************************************
+Generated Random Geometric Graph with d: 0.000269794
+Number of vertices: 50000000
+Number of edges: 899999910
+Time to generate distributed graph of 50000000 vertices (in s): 88.1166
+Size: 16 : 8
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004c0000000, size=400000008
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004d7e00000, size=16
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002004e0000000, size=14399998560
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200840000000, size=400000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200860000000, size=400000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200880000000, size=400000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002008a0000000, size=800000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002008e0000000, size=800000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200920000000, size=400000000
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 50000000    device: 0    UM: 0) at 1
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 50000000    device: 0    UM: 0) at 2
+My libomptarget -->   Map 0x00002004c0000000 to soft device, size=400000008
+My libomptarget -->   Apply opt 4 to 0x00002004c0000000
+My libomptarget -->   Apply opt 1 to 0x00002004c0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002004d7e00000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002004d7e00000
+My libomptarget -->   Apply opt 1 to 0x00002004d7e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002004e0000000 to soft device, size=14399998560
+My libomptarget -->   Apply opt 4 to 0x00002004e0000000
+My libomptarget -->   Apply opt 1 to 0x00002004e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200840000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200840000000
+My libomptarget -->   Apply opt 1 to 0x0000200840000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200880000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200880000000
+My libomptarget -->   Apply opt 1 to 0x0000200880000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200860000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200860000000
+My libomptarget -->   Apply opt 1 to 0x0000200860000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget -->   Unmap 0x0000200860000000 from soft device (0x0000200860000000), size=400000000
+My libomptarget -->   Unmap 0x0000200880000000 from soft device (0x0000200880000000), size=400000000
+My libomptarget -->   Unmap 0x0000200840000000 from soft device (0x0000200840000000), size=400000000
+My libomptarget -->   Unmap 0x00002004e0000000 from soft device (0x00002004e0000000), size=14399998560
+My libomptarget -->   Unmap 0x00002004d7e00000 from soft device (0x00002004d7e00000), size=16
+My libomptarget -->   Unmap 0x00002004c0000000 from soft device (0x00002004c0000000), size=400000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 50000000    device: 0    UM: 0) at 3
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 50000000    device: 0    UM: 0) at 4
+My libomptarget -->   Map 0x00007ffff20be9a8 to device (0x0000200997600000), size=8
+My libomptarget -->   Submit 0x00007ffff20be9a8 to 0x0000200997600000, size=8
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007ffff20be9a0 to device (0x0000200997600200), size=8
+My libomptarget -->   Submit 0x00007ffff20be9a0 to 0x0000200997600200, size=8
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget -->   Retrieve 0x00007ffff20be9a0 from 0x0000200997600200, size=8
+My libomptarget -->   Unmap 0x00007ffff20be9a0 from device (0x0000200997600200), size=8
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget -->   Retrieve 0x00007ffff20be9a8 from 0x0000200997600000, size=8
+My libomptarget -->   Unmap 0x00007ffff20be9a8 from device (0x0000200997600000), size=8
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 50000000    device: 0    UM: 0) at 5
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 50000000    device: 0    UM: 0) at 6
+My libomptarget -->   Map 0x00002004c0000000 to soft device, size=400000008
+My libomptarget -->   Apply opt 4 to 0x00002004c0000000
+My libomptarget -->   Apply opt 1 to 0x00002004c0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002004d7e00000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002004d7e00000
+My libomptarget -->   Apply opt 1 to 0x00002004d7e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002004e0000000 to soft device, size=14399998560
+My libomptarget -->   Apply opt 4 to 0x00002004e0000000
+My libomptarget -->   Apply opt 1 to 0x00002004e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200840000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200840000000
+My libomptarget -->   Apply opt 1 to 0x0000200840000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200880000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200880000000
+My libomptarget -->   Apply opt 1 to 0x0000200880000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200860000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200860000000
+My libomptarget -->   Apply opt 1 to 0x0000200860000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget -->   Unmap 0x0000200860000000 from soft device (0x0000200860000000), size=400000000
+My libomptarget -->   Unmap 0x0000200880000000 from soft device (0x0000200880000000), size=400000000
+My libomptarget -->   Unmap 0x0000200840000000 from soft device (0x0000200840000000), size=400000000
+My libomptarget -->   Unmap 0x00002004e0000000 from soft device (0x00002004e0000000), size=14399998560
+My libomptarget -->   Unmap 0x00002004d7e00000 from soft device (0x00002004d7e00000), size=16
+My libomptarget -->   Unmap 0x00002004c0000000 from soft device (0x00002004c0000000), size=400000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 50000000    device: 0    UM: 0) at 7
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002008e0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008e0000000
+My libomptarget -->   Apply opt 1 to 0x00002008e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008e0000000 from soft device (0x00002008e0000000), size=800000000
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 50000000    device: 0    UM: 0) at 8
+My libomptarget -->   Map 0x00007ffff20be9a8 to device (0x0000200997600000), size=8
+My libomptarget -->   Submit 0x00007ffff20be9a8 to 0x0000200997600000, size=8
+My libomptarget -->   Map 0x0000200920000000 to soft device, size=400000000
+My libomptarget -->   Apply opt 4 to 0x0000200920000000
+My libomptarget -->   Apply opt 1 to 0x0000200920000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007ffff20be9a0 to device (0x0000200997600200), size=8
+My libomptarget -->   Submit 0x00007ffff20be9a0 to 0x0000200997600200, size=8
+My libomptarget -->   Map 0x00002008a0000000 to soft device, size=800000000
+My libomptarget -->   Apply opt 4 to 0x00002008a0000000
+My libomptarget -->   Apply opt 1 to 0x00002008a0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002008a0000000 from soft device (0x00002008a0000000), size=800000000
+My libomptarget -->   Retrieve 0x00007ffff20be9a0 from 0x0000200997600200, size=8
+My libomptarget -->   Unmap 0x00007ffff20be9a0 from device (0x0000200997600200), size=8
+My libomptarget -->   Unmap 0x0000200920000000 from soft device (0x0000200920000000), size=400000000
+My libomptarget -->   Retrieve 0x00007ffff20be9a8 from 0x0000200997600000, size=8
+My libomptarget -->   Unmap 0x00007ffff20be9a8 from device (0x0000200997600000), size=8
+Total size: 16.7638
+Time: 80.839
+Modularity: -2e-08, Iterations: 2, Time (in s): 91.9998
+**********************************************************************
+==168619== Profiling application: ./miniVite -n 50000000
+==168619== Profiling result:
+            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
+ GPU activities:   88.78%  70.8005s         2  35.4002s  13.8961s  56.9044s  __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367
+                    7.38%  5.88504s         2  2.94252s  2.90431s  2.98072s  __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435
+                    3.45%  2.75114s         2  1.37557s  1.31162s  1.43952s  __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395
+                    0.39%  309.96ms         2  154.98ms  2.5277ms  307.43ms  __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454
+                    0.00%  14.176us         8  1.7720us  1.6320us  1.8880us  [CUDA memcpy DtoH]
+                    0.00%  7.2000us         5  1.4400us  1.2480us  1.7280us  [CUDA memcpy HtoD]
+      API calls:   98.42%  79.7474s         8  9.96842s  2.6219ms  56.9045s  cuCtxSynchronize
+                    0.80%  645.71ms         1  645.71ms  645.71ms  645.71ms  cuCtxDestroy
+                    0.65%  523.16ms         1  523.16ms  523.16ms  523.16ms  cuCtxCreate
+                    0.07%  55.862ms         4  13.965ms  26.356us  29.241ms  cuMemAlloc
+                    0.03%  21.165ms         9  2.3517ms  69.686us  20.311ms  cuMemAllocManaged
+                    0.02%  14.921ms        30  497.36us  3.8050us  10.727ms  cuMemAdvise
+                    0.01%  11.501ms         1  11.501ms  11.501ms  11.501ms  cuModuleLoadDataEx
+                    0.00%  3.5211ms         8  440.13us  33.118us  3.1674ms  cuLaunchKernel
+                    0.00%  3.4017ms         1  3.4017ms  3.4017ms  3.4017ms  cuModuleUnload
+                    0.00%  1.0287ms         4  257.18us  27.296us  488.97us  cuMemFree
+                    0.00%  610.48us         8  76.310us  53.527us  105.83us  cuMemcpyDtoH
+                    0.00%  233.54us         5  46.707us  27.817us  70.651us  cuMemcpyHtoD
+                    0.00%  62.364us        34  1.8340us     555ns  4.7440us  cuCtxSetCurrent
+                    0.00%  18.389us         8  2.2980us  1.7010us  2.9630us  cuFuncGetAttribute
+                    0.00%  13.046us        21     621ns     338ns  1.0360us  cuDeviceGetAttribute
+                    0.00%  12.614us         5  2.5220us  2.3040us  3.1820us  cuModuleGetGlobal
+                    0.00%  11.860us         6  1.9760us  1.2010us  4.5830us  cuDeviceGetPCIBusId
+                    0.00%  8.3770us         7  1.1960us     592ns  4.2380us  cuDeviceGet
+                    0.00%  7.7370us         4  1.9340us  1.5450us  2.9960us  cuModuleGetFunction
+                    0.00%  1.5730us         3     524ns     403ns     588ns  cuDeviceGetCount
+
+==168619== Unified Memory profiling result:
+Device "Tesla V100-SXM2-16GB (0)"
+   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
+  210990  169.82KB  64.000KB  1.6250MB  34.16998GB   1.252414s  Host To Device
+    9314  1.9949MB  64.000KB  2.0000MB  18.14508GB  433.0975ms  Device To Host
+   31726         -         -         -           -  79.700933s  Gpu page fault groups
+     319  1.9969MB  1.5000MB  2.0000MB  637.0000MB           -  Remote mapping to device
+Total CPU Page faults: 51896
+Total remote mappings from CPU: 319
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch3>
+Subject: Job 310708: <km> in cluster <summit> Done
+
+Job <km> was submitted from host <login1> by user <lld> in cluster <summit> at Wed Mar 27 16:42:25 2019
+Job was executed on host(s) <1*batch3>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 16:42:35 2019
+                            <42*g33n07>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 16:42:35 2019
+Terminated at Wed Mar 27 16:45:46 2019
+Results reported at Wed Mar 27 16:45:46 2019
+
+The output (if any) is above this job summary.
+
+My libomptarget --> Set mode to SDEV
+==114131== NVPROF is profiling process 114131, command: ./miniVite -n 5000000
+Average time to generate 10000000 random numbers using LCG (in s): 0.0181821
+**********************************************************************
+Generated Random Geometric Graph with d: 0.000817469
+Number of vertices: 5000000
+Number of edges: 89999910
+Time to generate distributed graph of 5000000 vertices (in s): 8.43148
+Size: 16 : 8
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200120000000, size=1439998560
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200175e00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200178600000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017ae00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017d600000, size=40000000
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 1
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 2
+My libomptarget -->   Map 0x00002000e0000000 to soft device, size=40000008
+My libomptarget -->   Apply opt 4 to 0x00002000e0000000
+My libomptarget -->   Apply opt 1 to 0x00002000e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2800000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002000e2800000
+My libomptarget -->   Apply opt 1 to 0x00002000e2800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200120000000 to soft device, size=1439998560
+My libomptarget -->   Apply opt 4 to 0x0000200120000000
+My libomptarget -->   Apply opt 1 to 0x0000200120000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200175e00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200175e00000
+My libomptarget -->   Apply opt 1 to 0x0000200175e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017ae00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017ae00000
+My libomptarget -->   Apply opt 1 to 0x000020017ae00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200178600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200178600000
+My libomptarget -->   Apply opt 1 to 0x0000200178600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000
+My libomptarget -->   Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 3
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 4
+My libomptarget -->   Map 0x00007fffe1e06888 to device (0x00002001d7600000), size=8
+My libomptarget -->   Submit 0x00007fffe1e06888 to 0x00002001d7600000, size=8
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007fffe1e06880 to device (0x00002001d7600200), size=8
+My libomptarget -->   Submit 0x00007fffe1e06880 to 0x00002001d7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffe1e06880 from 0x00002001d7600200, size=8
+My libomptarget -->   Unmap 0x00007fffe1e06880 from device (0x00002001d7600200), size=8
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffe1e06888 from 0x00002001d7600000, size=8
+My libomptarget -->   Unmap 0x00007fffe1e06888 from device (0x00002001d7600000), size=8
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 5
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 6
+My libomptarget -->   Map 0x00002000e0000000 to soft device, size=40000008
+My libomptarget -->   Apply opt 4 to 0x00002000e0000000
+My libomptarget -->   Apply opt 1 to 0x00002000e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2800000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002000e2800000
+My libomptarget -->   Apply opt 1 to 0x00002000e2800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200120000000 to soft device, size=1439998560
+My libomptarget -->   Apply opt 4 to 0x0000200120000000
+My libomptarget -->   Apply opt 1 to 0x0000200120000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200175e00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200175e00000
+My libomptarget -->   Apply opt 1 to 0x0000200175e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017ae00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017ae00000
+My libomptarget -->   Apply opt 1 to 0x000020017ae00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200178600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200178600000
+My libomptarget -->   Apply opt 1 to 0x0000200178600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000
+My libomptarget -->   Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 7
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 8
+My libomptarget -->   Map 0x00007fffe1e06888 to device (0x00002001d7600000), size=8
+My libomptarget -->   Submit 0x00007fffe1e06888 to 0x00002001d7600000, size=8
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007fffe1e06880 to device (0x00002001d7600200), size=8
+My libomptarget -->   Submit 0x00007fffe1e06880 to 0x00002001d7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffe1e06880 from 0x00002001d7600200, size=8
+My libomptarget -->   Unmap 0x00007fffe1e06880 from device (0x00002001d7600200), size=8
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffe1e06888 from 0x00002001d7600000, size=8
+My libomptarget -->   Unmap 0x00007fffe1e06888 from device (0x00002001d7600000), size=8
+Total size: 1.67638
+Time: 0.743411
+Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.44237
+**********************************************************************
+==114131== Profiling application: ./miniVite -n 5000000
+==114131== Profiling result:
+            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
+ GPU activities:   93.44%  474.32ms         2  237.16ms  21.093ms  453.23ms  __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367
+                    6.30%  31.997ms         2  15.999ms  253.70us  31.743ms  __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454
+                    0.13%  650.77us         2  325.38us  324.62us  326.15us  __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395
+                    0.12%  631.82us         2  315.91us  314.70us  317.13us  __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435
+                    0.00%  14.401us         8  1.8000us  1.6320us  2.0160us  [CUDA memcpy DtoH]
+                    0.00%  7.0080us         5  1.4010us  1.2480us  1.6000us  [CUDA memcpy HtoD]
+      API calls:   40.00%  518.67ms         1  518.67ms  518.67ms  518.67ms  cuCtxCreate
+                   39.20%  508.28ms         8  63.536ms  356.70us  453.31ms  cuCtxSynchronize
+                   16.52%  214.18ms         1  214.18ms  214.18ms  214.18ms  cuCtxDestroy
+                    1.62%  21.061ms         9  2.3401ms  80.485us  20.281ms  cuMemAllocManaged
+                    0.89%  11.579ms         1  11.579ms  11.579ms  11.579ms  cuModuleLoadDataEx
+                    0.56%  7.2959ms         1  7.2959ms  7.2959ms  7.2959ms  cuModuleUnload
+                    0.51%  6.6057ms         8  825.71us  28.769us  6.3330ms  cuLaunchKernel
+                    0.24%  3.0631ms         4  765.77us  23.171us  1.5982ms  cuMemAlloc
+                    0.23%  3.0404ms        30  101.35us  3.7320us  2.0698ms  cuMemAdvise
+                    0.12%  1.5644ms         4  391.11us  30.221us  753.34us  cuMemFree
+                    0.07%  889.41us         8  111.18us  89.358us  125.11us  cuMemcpyDtoH
+                    0.01%  193.93us         5  38.786us  26.534us  53.105us  cuMemcpyHtoD
+                    0.00%  59.670us        34  1.7550us     676ns  4.4900us  cuCtxSetCurrent
+                    0.00%  14.464us         8  1.8080us  1.2480us  2.4810us  cuFuncGetAttribute
+                    0.00%  13.321us         5  2.6640us  2.1600us  3.6020us  cuModuleGetGlobal
+                    0.00%  12.168us         6  2.0280us  1.1920us  5.1090us  cuDeviceGetPCIBusId
+                    0.00%  11.903us        21     566ns     293ns     949ns  cuDeviceGetAttribute
+                    0.00%  8.5710us         7  1.2240us     596ns  4.3550us  cuDeviceGet
+                    0.00%  7.0220us         4  1.7550us  1.4670us  2.4140us  cuModuleGetFunction
+                    0.00%  1.5720us         3     524ns     375ns     609ns  cuDeviceGetCount
+
+==114131== Unified Memory profiling result:
+Device "Tesla V100-SXM2-16GB (0)"
+   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
+    5614  164.73KB  64.000KB  1.2500MB  903.1250MB  33.28173ms  Host To Device
+     692         -         -         -           -  226.4184ms  Gpu page fault groups
+      40  1.9094MB  192.00KB  2.0000MB  76.37500MB           -  Remote mapping to device
+Total CPU Page faults: 5212
+Total remote mappings from CPU: 40
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch2>
+Subject: Job 310715: <km> in cluster <summit> Done
+
+Job <km> was submitted from host <login1> by user <lld> in cluster <summit> at Wed Mar 27 16:46:44 2019
+Job was executed on host(s) <1*batch2>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 16:46:54 2019
+                            <42*g31n10>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 16:46:54 2019
+Terminated at Wed Mar 27 16:47:11 2019
+Results reported at Wed Mar 27 16:47:11 2019
+
+The output (if any) is above this job summary.
+
+
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch1>
+Subject: Job 311049: <mnV> in cluster <summit> Done
+
+Job <mnV> was submitted from host <login3> by user <lld> in cluster <summit> at Wed Mar 27 22:00:55 2019
+Job was executed on host(s) <1*batch1>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 22:01:04 2019
+                            <42*a32n16>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 22:01:04 2019
+Terminated at Wed Mar 27 22:01:06 2019
+Results reported at Wed Mar 27 22:01:06 2019
+
+The output (if any) is above this job summary.
+
diff --git a/miniVite/log1 b/miniVite/log1
new file mode 100644
index 0000000..aa97cdb
--- /dev/null
+++ b/miniVite/log1
@@ -0,0 +1,598 @@
+My libomptarget --> Set mode to DEV
+==158920== NVPROF is profiling process 158920, command: ./miniVite -n 5000000
+Average time to generate 10000000 random numbers using LCG (in s): 0.0181787
+**********************************************************************
+Generated Random Geometric Graph with d: 0.000817469
+Number of vertices: 5000000
+Number of edges: 89999910
+Time to generate distributed graph of 5000000 vertices (in s): 8.47192
+Size: 16 : 8
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200100000000, size=1439998560
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200155e00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200158600000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015ae00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015d600000, size=40000000
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 1
+My libomptarget -->   Map 0x000020015d600000 to device (0x00002000d9800000), size=40000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x00002001a0000000), size=80000000
+My libomptarget -->   Retrieve 0x00002000e7800000 from 0x00002001a0000000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x00002001a0000000), size=80000000
+My libomptarget -->   Retrieve 0x000020015d600000 from 0x00002000d9800000, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x00002000d9800000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 2
+My libomptarget -->   Map 0x00002000e0000000 to device (0x00002000d9800000), size=40000008
+My libomptarget -->   Submit 0x00002000e0000000 to 0x00002000d9800000, size=40000008
+My libomptarget -->   Map 0x00002000e2800000 to device (0x000020019dc00000), size=16
+My libomptarget -->   Submit 0x00002000e2800000 to 0x000020019dc00000, size=16
+My libomptarget -->   Map 0x0000200100000000 to device (0x00002001a0000000), size=1439998560
+My libomptarget -->   Submit 0x0000200100000000 to 0x00002001a0000000, size=1439998560
+My libomptarget -->   Map 0x0000200155e00000 to device (0x00002001f5e00000), size=40000000
+My libomptarget -->   Submit 0x0000200155e00000 to 0x00002001f5e00000, size=40000000
+My libomptarget -->   Map 0x000020015ae00000 to device (0x00002001f8600000), size=40000000
+My libomptarget -->   Map 0x0000200158600000 to device (0x00002001fae00000), size=40000000
+My libomptarget -->   Submit 0x0000200158600000 to 0x00002001fae00000, size=40000000
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x0000200200000000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x0000200200000000, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x0000200204e00000), size=80000000
+My libomptarget -->   Submit 0x00002000e7800000 to 0x0000200204e00000, size=80000000
+My libomptarget -->   Map 0x000020015d600000 to device (0x0000200209c00000), size=40000000
+My libomptarget -->   Submit 0x000020015d600000 to 0x0000200209c00000, size=40000000
+My libomptarget -->   Retrieve 0x000020015d600000 from 0x0000200209c00000, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x0000200209c00000), size=40000000
+My libomptarget -->   Retrieve 0x00002000e7800000 from 0x0000200204e00000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x0000200204e00000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x0000200200000000), size=80000000
+My libomptarget -->   Unmap 0x0000200158600000 from device (0x00002001fae00000), size=40000000
+My libomptarget -->   Retrieve 0x000020015ae00000 from 0x00002001f8600000, size=40000000
+My libomptarget -->   Unmap 0x000020015ae00000 from device (0x00002001f8600000), size=40000000
+My libomptarget -->   Unmap 0x0000200155e00000 from device (0x00002001f5e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200100000000 from device (0x00002001a0000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from device (0x000020019dc00000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from device (0x00002000d9800000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 3
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x0000200237600000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x0000200237600000, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000
+My libomptarget -->   Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000
+My libomptarget -->   Retrieve 0x00002000e2a00000 from 0x0000200237600000, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x0000200237600000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 4
+My libomptarget -->   Map 0x00007fffedfd3f28 to device (0x0000200237600000), size=8
+My libomptarget -->   Submit 0x00007fffedfd3f28 to 0x0000200237600000, size=8
+My libomptarget -->   Map 0x000020015d600000 to device (0x0000200237800000), size=40000000
+My libomptarget -->   Submit 0x000020015d600000 to 0x0000200237800000, size=40000000
+My libomptarget -->   Map 0x00007fffedfd3f20 to device (0x0000200237600200), size=8
+My libomptarget -->   Submit 0x00007fffedfd3f20 to 0x0000200237600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffedfd3f20 from 0x0000200237600200, size=8
+My libomptarget -->   Unmap 0x00007fffedfd3f20 from device (0x0000200237600200), size=8
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x0000200237800000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffedfd3f28 from 0x0000200237600000, size=8
+My libomptarget -->   Unmap 0x00007fffedfd3f28 from device (0x0000200237600000), size=8
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 5
+My libomptarget -->   Map 0x000020015d600000 to device (0x0000200237600000), size=40000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x0000200239e00000), size=80000000
+My libomptarget -->   Retrieve 0x00002000e7800000 from 0x0000200239e00000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x0000200239e00000), size=80000000
+My libomptarget -->   Retrieve 0x000020015d600000 from 0x0000200237600000, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x0000200237600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 6
+My libomptarget -->   Map 0x00002000e0000000 to device (0x0000200237600000), size=40000008
+My libomptarget -->   Submit 0x00002000e0000000 to 0x0000200237600000, size=40000008
+My libomptarget -->   Map 0x00002000e2800000 to device (0x0000200239e00000), size=16
+My libomptarget -->   Submit 0x00002000e2800000 to 0x0000200239e00000, size=16
+My libomptarget -->   Map 0x0000200100000000 to device (0x00002001a0000000), size=1439998560
+My libomptarget -->   Submit 0x0000200100000000 to 0x00002001a0000000, size=1439998560
+My libomptarget -->   Map 0x0000200155e00000 to device (0x00002001f5e00000), size=40000000
+My libomptarget -->   Submit 0x0000200155e00000 to 0x00002001f5e00000, size=40000000
+My libomptarget -->   Map 0x000020015ae00000 to device (0x00002001f8600000), size=40000000
+My libomptarget -->   Map 0x0000200158600000 to device (0x00002001fae00000), size=40000000
+My libomptarget -->   Submit 0x0000200158600000 to 0x00002001fae00000, size=40000000
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000
+My libomptarget -->   Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000
+My libomptarget -->   Map 0x000020015d600000 to device (0x00002001fd600000), size=40000000
+My libomptarget -->   Submit 0x000020015d600000 to 0x00002001fd600000, size=40000000
+My libomptarget -->   Retrieve 0x000020015d600000 from 0x00002001fd600000, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x00002001fd600000), size=40000000
+My libomptarget -->   Retrieve 0x00002000e7800000 from 0x00002000d9800000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000
+My libomptarget -->   Unmap 0x0000200158600000 from device (0x00002001fae00000), size=40000000
+My libomptarget -->   Retrieve 0x000020015ae00000 from 0x00002001f8600000, size=40000000
+My libomptarget -->   Unmap 0x000020015ae00000 from device (0x00002001f8600000), size=40000000
+My libomptarget -->   Unmap 0x0000200155e00000 from device (0x00002001f5e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200100000000 from device (0x00002001a0000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from device (0x0000200239e00000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from device (0x0000200237600000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 7
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x0000200237600000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x0000200237600000, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to device (0x00002000d9800000), size=80000000
+My libomptarget -->   Submit 0x00002000e7800000 to 0x00002000d9800000, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from device (0x00002000d9800000), size=80000000
+My libomptarget -->   Retrieve 0x00002000e2a00000 from 0x0000200237600000, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x0000200237600000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 8
+My libomptarget -->   Map 0x00007fffedfd3f28 to device (0x0000200237600000), size=8
+My libomptarget -->   Submit 0x00007fffedfd3f28 to 0x0000200237600000, size=8
+My libomptarget -->   Map 0x000020015d600000 to device (0x0000200237800000), size=40000000
+My libomptarget -->   Submit 0x000020015d600000 to 0x0000200237800000, size=40000000
+My libomptarget -->   Map 0x00007fffedfd3f20 to device (0x0000200237600200), size=8
+My libomptarget -->   Submit 0x00007fffedfd3f20 to 0x0000200237600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to device (0x000020023a000000), size=80000000
+My libomptarget -->   Submit 0x00002000e2a00000 to 0x000020023a000000, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from device (0x000020023a000000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffedfd3f20 from 0x0000200237600200, size=8
+My libomptarget -->   Unmap 0x00007fffedfd3f20 from device (0x0000200237600200), size=8
+My libomptarget -->   Unmap 0x000020015d600000 from device (0x0000200237800000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffedfd3f28 from 0x0000200237600000, size=8
+My libomptarget -->   Unmap 0x00007fffedfd3f28 from device (0x0000200237600000), size=8
+Total size: 1.67638
+Time: 0.797371
+Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.47803
+**********************************************************************
+==158920== Profiling application: ./miniVite -n 5000000
+==158920== Profiling result:
+            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
+ GPU activities:   91.38%  486.10ms        36  13.503ms  2.0480us  166.74ms  [CUDA memcpy DtoD]
+                    8.26%  43.918ms         2  21.959ms  21.908ms  22.010ms  __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367
+                    0.14%  750.04us         2  375.02us  374.27us  375.77us  __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395
+                    0.12%  639.87us         2  319.93us  318.62us  321.25us  __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435
+                    0.09%  504.73us         2  252.37us  252.00us  252.73us  __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454
+                    0.00%  15.552us         8  1.9440us  1.7600us  2.1760us  [CUDA memcpy DtoH]
+                    0.00%  7.9680us         5  1.5930us  1.3760us  1.7280us  [CUDA memcpy HtoD]
+      API calls:   35.22%  524.24ms         1  524.24ms  524.24ms  524.24ms  cuCtxCreate
+                   29.72%  442.40ms        29  15.255ms  28.595us  167.00ms  cuMemcpyHtoD
+                   19.82%  295.04ms         1  295.04ms  295.04ms  295.04ms  cuCtxDestroy
+                    3.67%  54.573ms        34  1.6051ms  42.012us  8.8931ms  cuMemAlloc
+                    3.52%  52.401ms        20  2.6201ms  90.217us  8.2123ms  cuMemcpyDtoH
+                    3.13%  46.539ms         8  5.8174ms  335.42us  22.099ms  cuCtxSynchronize
+                    1.98%  29.432ms        34  865.65us  34.372us  3.1892ms  cuMemFree
+                    1.41%  21.040ms         9  2.3377ms  63.714us  20.305ms  cuMemAllocManaged
+                    0.78%  11.613ms         1  11.613ms  11.613ms  11.613ms  cuModuleLoadDataEx
+                    0.48%  7.2150ms         1  7.2150ms  7.2150ms  7.2150ms  cuModuleUnload
+                    0.24%  3.5659ms         8  445.74us  30.009us  3.2446ms  cuLaunchKernel
+                    0.01%  186.90us       130  1.4370us     533ns  5.1930us  cuCtxSetCurrent
+                    0.00%  15.519us         8  1.9390us  1.0490us  2.8850us  cuFuncGetAttribute
+                    0.00%  13.594us         5  2.7180us  2.2340us  3.8610us  cuModuleGetGlobal
+                    0.00%  12.258us        21     583ns     272ns  1.0910us  cuDeviceGetAttribute
+                    0.00%  11.703us         6  1.9500us  1.2010us  4.3370us  cuDeviceGetPCIBusId
+                    0.00%  8.8880us         7  1.2690us     623ns  4.5410us  cuDeviceGet
+                    0.00%  7.9890us         4  1.9970us  1.5170us  3.3080us  cuModuleGetFunction
+                    0.00%  1.5340us         3     511ns     361ns     606ns  cuDeviceGetCount
+
+==158920== Unified Memory profiling result:
+Total CPU Page faults: 5172
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch5>
+Subject: Job 310716: <km> in cluster <summit> Done
+
+Job <km> was submitted from host <login1> by user <lld> in cluster <summit> at Wed Mar 27 16:47:14 2019
+Job was executed on host(s) <1*batch5>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 16:47:28 2019
+                            <42*a20n12>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 16:47:28 2019
+Terminated at Wed Mar 27 16:47:49 2019
+Results reported at Wed Mar 27 16:47:49 2019
+
+The output (if any) is above this job summary.
+
+My libomptarget --> Set mode to UM
+==13217== NVPROF is profiling process 13217, command: ./miniVite -n 5000000
+Average time to generate 10000000 random numbers using LCG (in s): 0.0182397
+**********************************************************************
+Generated Random Geometric Graph with d: 0.000817469
+Number of vertices: 5000000
+Number of edges: 89999910
+Time to generate distributed graph of 5000000 vertices (in s): 8.38551
+Size: 16 : 8
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200100000000, size=1439998560
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200155e00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200158600000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015ae00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020015d600000, size=40000000
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 1
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 2
+My libomptarget -->   Map 0x00002000e0000000 to UM, size=40000008
+My libomptarget -->   Map 0x00002000e2800000 to UM, size=16
+My libomptarget -->   Map 0x0000200100000000 to UM, size=1439998560
+My libomptarget -->   Map 0x0000200155e00000 to UM, size=40000000
+My libomptarget -->   Map 0x000020015ae00000 to UM, size=40000000
+My libomptarget -->   Map 0x0000200158600000 to UM, size=40000000
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200158600000 from UM (0x0000200158600000), size=40000000
+My libomptarget -->   Unmap 0x000020015ae00000 from UM (0x000020015ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200155e00000 from UM (0x0000200155e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200100000000 from UM (0x0000200100000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from UM (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from UM (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 3
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 4
+My libomptarget -->   Map 0x00007fffda6c2478 to device (0x00002001b7600000), size=8
+My libomptarget -->   Submit 0x00007fffda6c2478 to 0x00002001b7600000, size=8
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Map 0x00007fffda6c2470 to device (0x00002001b7600200), size=8
+My libomptarget -->   Submit 0x00007fffda6c2470 to 0x00002001b7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffda6c2470 from 0x00002001b7600200, size=8
+My libomptarget -->   Unmap 0x00007fffda6c2470 from device (0x00002001b7600200), size=8
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffda6c2478 from 0x00002001b7600000, size=8
+My libomptarget -->   Unmap 0x00007fffda6c2478 from device (0x00002001b7600000), size=8
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 5
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 6
+My libomptarget -->   Map 0x00002000e0000000 to UM, size=40000008
+My libomptarget -->   Map 0x00002000e2800000 to UM, size=16
+My libomptarget -->   Map 0x0000200100000000 to UM, size=1439998560
+My libomptarget -->   Map 0x0000200155e00000 to UM, size=40000000
+My libomptarget -->   Map 0x000020015ae00000 to UM, size=40000000
+My libomptarget -->   Map 0x0000200158600000 to UM, size=40000000
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200158600000 from UM (0x0000200158600000), size=40000000
+My libomptarget -->   Unmap 0x000020015ae00000 from UM (0x000020015ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200155e00000 from UM (0x0000200155e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200100000000 from UM (0x0000200100000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from UM (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from UM (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 7
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Map 0x00002000e7800000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e7800000 from UM (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 8
+My libomptarget -->   Map 0x00007fffda6c2478 to device (0x00002001b7600000), size=8
+My libomptarget -->   Submit 0x00007fffda6c2478 to 0x00002001b7600000, size=8
+My libomptarget -->   Map 0x000020015d600000 to UM, size=40000000
+My libomptarget -->   Map 0x00007fffda6c2470 to device (0x00002001b7600200), size=8
+My libomptarget -->   Submit 0x00007fffda6c2470 to 0x00002001b7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to UM, size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from UM (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffda6c2470 from 0x00002001b7600200, size=8
+My libomptarget -->   Unmap 0x00007fffda6c2470 from device (0x00002001b7600200), size=8
+My libomptarget -->   Unmap 0x000020015d600000 from UM (0x000020015d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffda6c2478 from 0x00002001b7600000, size=8
+My libomptarget -->   Unmap 0x00007fffda6c2478 from device (0x00002001b7600000), size=8
+Total size: 1.67638
+Time: 0.732451
+Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.4223
+**********************************************************************
+==13217== Profiling application: ./miniVite -n 5000000
+==13217== Profiling result:
+            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
+ GPU activities:   93.79%  493.19ms         2  246.59ms  43.538ms  449.65ms  __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367
+                    5.96%  31.328ms         2  15.664ms  250.43us  31.078ms  __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454
+                    0.12%  646.84us         2  323.42us  322.85us  324.00us  __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395
+                    0.12%  640.60us         2  320.30us  317.53us  323.07us  __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435
+                    0.00%  14.336us         8  1.7920us  1.6000us  2.0160us  [CUDA memcpy DtoH]
+                    0.00%  7.0400us         5  1.4080us  1.2480us  1.6000us  [CUDA memcpy HtoD]
+      API calls:   38.34%  526.46ms         8  65.807ms  356.21us  449.74ms  cuCtxSynchronize
+                   38.05%  522.45ms         1  522.45ms  522.45ms  522.45ms  cuCtxCreate
+                   19.76%  271.31ms         1  271.31ms  271.31ms  271.31ms  cuCtxDestroy
+                    1.54%  21.175ms         9  2.3528ms  62.599us  20.311ms  cuMemAllocManaged
+                    0.86%  11.793ms         1  11.793ms  11.793ms  11.793ms  cuModuleLoadDataEx
+                    0.54%  7.3927ms         1  7.3927ms  7.3927ms  7.3927ms  cuModuleUnload
+                    0.49%  6.6668ms         8  833.35us  24.849us  6.3629ms  cuLaunchKernel
+                    0.23%  3.1179ms         4  779.48us  20.709us  1.6574ms  cuMemAlloc
+                    0.12%  1.6051ms         4  401.28us  34.284us  770.77us  cuMemFree
+                    0.06%  822.10us         8  102.76us  88.720us  116.52us  cuMemcpyDtoH
+                    0.02%  208.24us         5  41.647us  26.380us  66.923us  cuMemcpyHtoD
+                    0.01%  69.504us        34  2.0440us     543ns  6.6380us  cuCtxSetCurrent
+                    0.00%  17.224us         8  2.1530us  1.3830us  5.5330us  cuFuncGetAttribute
+                    0.00%  13.122us        21     624ns     432ns  1.7660us  cuDeviceGetAttribute
+                    0.00%  13.090us         5  2.6180us  2.0610us  3.8750us  cuModuleGetGlobal
+                    0.00%  10.963us         6  1.8270us  1.1400us  4.0110us  cuDeviceGetPCIBusId
+                    0.00%  8.8770us         7  1.2680us     602ns  4.2380us  cuDeviceGet
+                    0.00%  8.5390us         4  2.1340us  1.4970us  3.8560us  cuModuleGetFunction
+                    0.00%  1.5890us         3     529ns     416ns     599ns  cuDeviceGetCount
+
+==13217== Unified Memory profiling result:
+Device "Tesla V100-SXM2-16GB (0)"
+   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
+    9400  166.63KB  64.000KB  1.3750MB  1.493774GB  56.01582ms  Host To Device
+     812  192.95KB  64.000KB  960.00KB  153.0000MB  5.260943ms  Device To Host
+    1694         -         -         -           -  505.7533ms  Gpu page fault groups
+Total CPU Page faults: 5664
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch5>
+Subject: Job 310719: <km> in cluster <summit> Done
+
+Job <km> was submitted from host <login1> by user <lld> in cluster <summit> at Wed Mar 27 16:50:55 2019
+Job was executed on host(s) <1*batch5>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 16:51:02 2019
+                            <42*a29n08>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 16:51:02 2019
+Terminated at Wed Mar 27 16:51:20 2019
+Results reported at Wed Mar 27 16:51:20 2019
+
+The output (if any) is above this job summary.
+
+My libomptarget --> Set mode to SDEV
+==15656== NVPROF is profiling process 15656, command: ./miniVite -n 5000000
+Average time to generate 10000000 random numbers using LCG (in s): 0.0181794
+**********************************************************************
+Generated Random Geometric Graph with d: 0.000817469
+Number of vertices: 5000000
+Number of edges: 89999910
+Time to generate distributed graph of 5000000 vertices (in s): 8.4145
+Size: 16 : 8
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e0000000, size=40000008
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2800000, size=16
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200120000000, size=1439998560
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200175e00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x0000200178600000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017ae00000, size=40000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e2a00000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x00002000e7800000, size=80000000
+My libomptarget --> omp_target_alloc returns uvm ptr 0x000020017d600000, size=40000000
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 1
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 2
+My libomptarget -->   Map 0x00002000e0000000 to soft device, size=40000008
+My libomptarget -->   Apply opt 4 to 0x00002000e0000000
+My libomptarget -->   Apply opt 1 to 0x00002000e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2800000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002000e2800000
+My libomptarget -->   Apply opt 1 to 0x00002000e2800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200120000000 to soft device, size=1439998560
+My libomptarget -->   Apply opt 4 to 0x0000200120000000
+My libomptarget -->   Apply opt 1 to 0x0000200120000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200175e00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200175e00000
+My libomptarget -->   Apply opt 1 to 0x0000200175e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017ae00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017ae00000
+My libomptarget -->   Apply opt 1 to 0x000020017ae00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200178600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200178600000
+My libomptarget -->   Apply opt 1 to 0x0000200178600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000
+My libomptarget -->   Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 3
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 4
+My libomptarget -->   Map 0x00007fffca5000d8 to device (0x00002001d7600000), size=8
+My libomptarget -->   Submit 0x00007fffca5000d8 to 0x00002001d7600000, size=8
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007fffca5000d0 to device (0x00002001d7600200), size=8
+My libomptarget -->   Submit 0x00007fffca5000d0 to 0x00002001d7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffca5000d0 from 0x00002001d7600200, size=8
+My libomptarget -->   Unmap 0x00007fffca5000d0 from device (0x00002001d7600200), size=8
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffca5000d8 from 0x00002001d7600000, size=8
+My libomptarget -->   Unmap 0x00007fffca5000d8 from device (0x00002001d7600000), size=8
+My libomptarget --> COMPUTE (0x00000000100166f8)	(#iter: 5000000    device: 0    UM: 0) at 5
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget --> COMPUTE (0x0000000010016724)	(#iter: 5000000    device: 0    UM: 0) at 6
+My libomptarget -->   Map 0x00002000e0000000 to soft device, size=40000008
+My libomptarget -->   Apply opt 4 to 0x00002000e0000000
+My libomptarget -->   Apply opt 1 to 0x00002000e0000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2800000 to soft device, size=16
+My libomptarget -->   Apply opt 4 to 0x00002000e2800000
+My libomptarget -->   Apply opt 1 to 0x00002000e2800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200120000000 to soft device, size=1439998560
+My libomptarget -->   Apply opt 4 to 0x0000200120000000
+My libomptarget -->   Apply opt 1 to 0x0000200120000000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200175e00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200175e00000
+My libomptarget -->   Apply opt 1 to 0x0000200175e00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017ae00000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017ae00000
+My libomptarget -->   Apply opt 1 to 0x000020017ae00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x0000200178600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x0000200178600000
+My libomptarget -->   Apply opt 1 to 0x0000200178600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Unmap 0x0000200178600000 from soft device (0x0000200178600000), size=40000000
+My libomptarget -->   Unmap 0x000020017ae00000 from soft device (0x000020017ae00000), size=40000000
+My libomptarget -->   Unmap 0x0000200175e00000 from soft device (0x0000200175e00000), size=40000000
+My libomptarget -->   Unmap 0x0000200120000000 from soft device (0x0000200120000000), size=1439998560
+My libomptarget -->   Unmap 0x00002000e2800000 from soft device (0x00002000e2800000), size=16
+My libomptarget -->   Unmap 0x00002000e0000000 from soft device (0x00002000e0000000), size=40000008
+My libomptarget --> COMPUTE (0x00000000100166d9)	(#iter: 5000000    device: 0    UM: 0) at 7
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00002000e7800000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e7800000
+My libomptarget -->   Apply opt 1 to 0x00002000e7800000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e7800000 from soft device (0x00002000e7800000), size=80000000
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget --> COMPUTE (0x00000000100166d8)	(#iter: 5000000    device: 0    UM: 0) at 8
+My libomptarget -->   Map 0x00007fffca5000d8 to device (0x00002001d7600000), size=8
+My libomptarget -->   Submit 0x00007fffca5000d8 to 0x00002001d7600000, size=8
+My libomptarget -->   Map 0x000020017d600000 to soft device, size=40000000
+My libomptarget -->   Apply opt 4 to 0x000020017d600000
+My libomptarget -->   Apply opt 1 to 0x000020017d600000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Map 0x00007fffca5000d0 to device (0x00002001d7600200), size=8
+My libomptarget -->   Submit 0x00007fffca5000d0 to 0x00002001d7600200, size=8
+My libomptarget -->   Map 0x00002000e2a00000 to soft device, size=80000000
+My libomptarget -->   Apply opt 4 to 0x00002000e2a00000
+My libomptarget -->   Apply opt 1 to 0x00002000e2a00000
+My libomptarget --> Invalid optimization
+My libomptarget -->   Unmap 0x00002000e2a00000 from soft device (0x00002000e2a00000), size=80000000
+My libomptarget -->   Retrieve 0x00007fffca5000d0 from 0x00002001d7600200, size=8
+My libomptarget -->   Unmap 0x00007fffca5000d0 from device (0x00002001d7600200), size=8
+My libomptarget -->   Unmap 0x000020017d600000 from soft device (0x000020017d600000), size=40000000
+My libomptarget -->   Retrieve 0x00007fffca5000d8 from 0x00002001d7600000, size=8
+My libomptarget -->   Unmap 0x00007fffca5000d8 from device (0x00002001d7600000), size=8
+Total size: 1.67638
+Time: 0.712737
+Modularity: 8.22212e-07, Iterations: 2, Time (in s): 2.37011
+**********************************************************************
+==15656== Profiling application: ./miniVite -n 5000000
+==15656== Profiling result:
+            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
+ GPU activities:   93.47%  443.05ms         2  221.53ms  20.894ms  422.16ms  __omp_offloading_35_eeedb51__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1367
+                    6.26%  29.668ms         2  14.834ms  252.83us  29.415ms  __omp_offloading_35_eeedb51__Z16distCleanCWandCUlPdP4Comm_l454
+                    0.14%  645.59us         2  322.80us  322.37us  323.23us  __omp_offloading_35_eeedb51__Z21distComputeModularityRK5GraphP4CommPKddi_l395
+                    0.13%  635.20us         2  317.60us  317.05us  318.14us  __omp_offloading_35_eeedb51__Z20distUpdateLocalCinfolP4CommPKS__l435
+                    0.00%  14.368us         8  1.7960us  1.6000us  2.0160us  [CUDA memcpy DtoH]
+                    0.00%  7.0720us         5  1.4140us  1.2480us  1.6000us  [CUDA memcpy HtoD]
+      API calls:   40.70%  520.60ms         1  520.60ms  520.60ms  520.60ms  cuCtxCreate
+                   37.11%  474.66ms         8  59.333ms  348.41us  422.24ms  cuCtxSynchronize
+                   17.93%  229.37ms         1  229.37ms  229.37ms  229.37ms  cuCtxDestroy
+                    1.65%  21.044ms         9  2.3382ms  77.557us  20.303ms  cuMemAllocManaged
+                    0.90%  11.558ms         1  11.558ms  11.558ms  11.558ms  cuModuleLoadDataEx
+                    0.57%  7.2276ms         1  7.2276ms  7.2276ms  7.2276ms  cuModuleUnload
+                    0.52%  6.5948ms         8  824.34us  24.212us  6.3617ms  cuLaunchKernel
+                    0.22%  2.7807ms        30  92.689us  3.1470us  1.9706ms  cuMemAdvise
+                    0.20%  2.5650ms         4  641.26us  20.647us  1.3774ms  cuMemAlloc
+                    0.12%  1.5353ms         4  383.83us  30.558us  741.42us  cuMemFree
+                    0.06%  830.25us         8  103.78us  88.896us  125.59us  cuMemcpyDtoH
+                    0.01%  168.38us         5  33.675us  26.187us  48.213us  cuMemcpyHtoD
+                    0.00%  53.465us        34  1.5720us     641ns  4.6320us  cuCtxSetCurrent
+                    0.00%  13.006us         5  2.6010us  2.0720us  3.2640us  cuModuleGetGlobal
+                    0.00%  12.896us        21     614ns     365ns  1.1060us  cuDeviceGetAttribute
+                    0.00%  12.454us         8  1.5560us     890ns  2.4160us  cuFuncGetAttribute
+                    0.00%  11.742us         6  1.9570us  1.1400us  4.6640us  cuDeviceGetPCIBusId
+                    0.00%  8.6470us         7  1.2350us     597ns  4.3750us  cuDeviceGet
+                    0.00%  7.5310us         4  1.8820us  1.4760us  2.8370us  cuModuleGetFunction
+                    0.00%  1.5450us         3     515ns     329ns     621ns  cuDeviceGetCount
+
+==15656== Unified Memory profiling result:
+Device "Tesla V100-SXM2-16GB (0)"
+   Count  Avg Size  Min Size  Max Size  Total Size  Total Time  Name
+    9041  158.13KB  64.000KB  1.3125MB  1.363464GB  51.94749ms  Host To Device
+    1481         -         -         -           -  448.4253ms  Gpu page fault groups
+      40  1.9094MB  192.00KB  2.0000MB  76.37500MB           -  Remote mapping to device
+Total CPU Page faults: 5212
+Total remote mappings from CPU: 40
+
+------------------------------------------------------------
+Sender: LSF System <lsfadmin@batch5>
+Subject: Job 310728: <km> in cluster <summit> Done
+
+Job <km> was submitted from host <login1> by user <lld> in cluster <summit> at Wed Mar 27 17:01:23 2019
+Job was executed on host(s) <1*batch5>, in queue <batch>, as user <lld> in cluster <summit> at Wed Mar 27 17:01:36 2019
+                            <42*h36n13>
+</ccs/home/lld> was used as the home directory.
+</ccs/home/lld/apps/miniVite> was used as the working directory.
+Started at Wed Mar 27 17:01:36 2019
+Terminated at Wed Mar 27 17:01:53 2019
+Results reported at Wed Mar 27 17:01:53 2019
+
+The output (if any) is above this job summary.
+
diff --git a/miniVite/logcmpl b/miniVite/logcmpl
new file mode 100644
index 0000000..aa4afc3
--- /dev/null
+++ b/miniVite/logcmpl
@@ -0,0 +1,5666 @@
+mpicxx -std=c++11 -g -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES  -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so -c -o main.o main.cpp
+In file included from main.cpp:58:
+In file included from ./dspl_gpu_kernel.hpp:58:
+In file included from ./graph.hpp:56:
+./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value]
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^
+./utils.hpp:263:56: note: use function 'std::abs' instead
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^~~~
+                                                       std::abs
+  ---- Function Argument Access Frequency CG Analysis ----
+On function _Z7is_pwr2i
+Round 0
+Round end
+On function _Z8reseederj
+Round 0
+Round end
+On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_
+Round 0
+  alias entry   %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369
+    alias entry   %19 = bitcast i32** %18 to i64*, !dbg !10369
+    alias entry   %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376
+Round 1
+Round end
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+  Frequency of %"class.std::seed_seq"* %0
+  load: 1.254902e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z4lockv
+Round 0
+Round end
+On function _Z6unlockv
+Round 0
+Round end
+On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __clang_call_terminate
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined.
+Round 0
+  alias entry   %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %"class.std::vector.10"* %4
+    load (6.350000e+00) from %"class.std::vector.15"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 1.270000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %5
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t
+Round 0
+  alias entry   %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283
+    alias entry   %10 = bitcast double** %9 to i64*, !dbg !10283
+    alias entry   %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..2
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+    alias entry   %102 = bitcast double* %3 to i64*, !dbg !10325
+Round 1
+Round end
+    load (3.157895e-01) from %"class.std::vector.10"* %4
+    load (2.105263e-01) from double* %3
+    store (2.105263e-01) to double* %3
+    load (2.105263e-01) from double* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z12distInitCommRSt6vectorIlSaIlEES2_l
+Round 0
+  alias entry   %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273
+    alias entry   %7 = bitcast i64** %6 to i64*, !dbg !10273
+    alias entry   %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+  Frequency of %"class.std::vector.0"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..4
+Round 0
+  alias entry   %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi
+Round 0
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485
+  alias entry   %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514
+  alias entry   %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532
+    alias entry   %25 = bitcast double** %24 to i64*, !dbg !10532
+    alias entry   %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536
+  alias entry   %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572
+    alias entry   %41 = bitcast i64** %40 to i64*, !dbg !10572
+    alias entry   %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574
+  alias entry   %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600
+    alias entry   %57 = bitcast i64** %56 to i64*, !dbg !10600
+    alias entry   %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601
+  alias entry   %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622
+    alias entry   %73 = bitcast double** %72 to i64*, !dbg !10622
+    alias entry   %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654
+    alias entry   %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654
+    alias entry   %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685
+    alias entry   %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685
+    alias entry   %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+Warning: wrong traversal order, or recursive call
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352
+  alias entry   %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !10330
+  alias entry   %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !10333
+  alias entry   %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !10340
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !10330
+  alias entry   %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !10333
+  alias entry   %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !10340
+Round 1
+Round end
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.109375e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 4.043750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+  alias entry   %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 0, !dbg !10308
+  alias entry   %22 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 1, !dbg !10310
+  alias entry   %31 = getelementptr inbounds i64, i64* %7, i64 %30, !dbg !10326
+  alias entry   %39 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %37, i32 0, !dbg !10337
+  alias entry   %48 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, !dbg !10348
+  alias entry   %58 = getelementptr inbounds double, double* %4, i64 %52, !dbg !10358
+  alias entry   %64 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10364
+  alias entry   %65 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 1, !dbg !10367
+    alias entry   %71 = bitcast double* %22 to i64*, !dbg !10375
+  alias entry   %74 = getelementptr inbounds double, double* %4, i64 %73, !dbg !10377
+    alias entry   %75 = bitcast double* %74 to i64*, !dbg !10378
+Round 1
+Round end
+    load (1.593750e+01) from %struct.Edge* %6
+    load (7.937500e+00) from %struct.Edge* %6
+    load (1.593750e+01) from i64* %7
+    load (1.593750e+01) from i32* %3
+    load (1.625000e+02) from %struct.clmap_t* %2
+    load (9.937500e+00) from i32* %5
+    load (4.937500e+00) from %struct.Edge* %6
+    load (4.937500e+00) from double* %4
+    store (4.937500e+00) to double* %4
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to i32* %3
+    load (1.093750e+01) from i32* %5
+    load (5.437500e+00) from %struct.Edge* %6
+    store (5.437500e+00) to double* %4
+    store (5.437500e+00) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 1.625000e+02		  store: 1.087500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.593750e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 4.937500e+00		  store: 1.037500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 2.087500e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 3.425000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 1.593750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316
+  alias entry   %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322
+  alias entry   %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329
+  alias entry   %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337
+  alias entry   %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341
+  alias entry   %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401
+  alias entry   %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403
+  alias entry   %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414
+    alias entry   %95 = bitcast double* %48 to i64*, !dbg !10457
+  alias entry   %118 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470
+  alias entry   %122 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 1, !dbg !10533
+  alias entry   %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 0, !dbg !10534
+  alias entry   %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 1, !dbg !10533
+  alias entry   %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 0, !dbg !10534
+  alias entry   %236 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 1, !dbg !10572
+    alias entry   %237 = bitcast double* %236 to i64*, !dbg !10573
+  alias entry   %248 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 0, !dbg !10575
+  alias entry   %250 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578
+    alias entry   %252 = bitcast double* %250 to i64*, !dbg !10581
+  alias entry   %263 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583
+  alias entry   %267 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587
+  alias entry   %270 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 1, !dbg !10533
+  alias entry   %272 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 0, !dbg !10534
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (7.992188e+00) from %struct.Edge* %3
+    load (3.992188e+00) from %struct.Edge* %3
+    load (7.992188e+00) from i64* %4
+    load (2.492188e+00) from %struct.Edge* %3
+    load (2.742188e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.250000e-01) from %struct.Comm* %7
+    load (1.250000e-01) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.721875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 8.992188e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.121875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21distComputeModularityRK5GraphP4CommPKddi
+Round 0
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304
+    base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+    base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+    base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+    base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Round 1
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 2
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Round 4
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.7
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..8
+Round 0
+  alias entry   %40 = getelementptr inbounds double, double* %6, i64 %39, !dbg !10318
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %39, i32 1, !dbg !10321
+    alias entry   %63 = bitcast double* %5 to i64*, !dbg !10329
+    alias entry   %75 = bitcast double* %7 to i64*, !dbg !10329
+Round 1
+Round end
+    load (1.010526e+01) from double* %6
+    load (1.010526e+01) from %struct.Comm* %8
+    load (2.105263e-01) from double* %5
+    store (2.105263e-01) to double* %5
+    load (2.105263e-01) from double* %7
+    store (2.105263e-01) to double* %7
+    load (2.105263e-01) from double* %5
+    load (2.105263e-01) from double* %7
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %7
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.9
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..10
+Round 0
+    alias entry   %67 = bitcast double* %3 to i64*, !dbg !10310
+    alias entry   %79 = bitcast double* %5 to i64*, !dbg !10310
+Round 1
+Round end
+    load (2.916667e-01) from double* %3
+    store (2.916667e-01) to double* %3
+    load (2.916667e-01) from double* %5
+    store (2.916667e-01) to double* %5
+    load (3.333333e-01) from double* %3
+    load (3.333333e-01) from double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z20distUpdateLocalCinfolP4CommPKS_
+Round 0
+    base alias entry   %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269
+    base alias entry   %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269
+    base alias entry   %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..13
+Round 0
+  alias entry   %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304
+  alias entry   %36 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %35, i32 1, !dbg !10304
+  alias entry   %37 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304
+  alias entry   %38 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %35, i32 1, !dbg !10304
+  alias entry   %39 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304
+  alias entry   %41 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %40, !dbg !10304
+  alias entry   %42 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304
+  alias entry   %43 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %40, !dbg !10304
+    alias entry   %44 = bitcast double* %38 to %struct.Comm*, !dbg !10304
+    alias entry   %46 = bitcast double* %36 to %struct.Comm*, !dbg !10304
+    alias entry   %49 = bitcast %struct.Comm* %43 to double*, !dbg !10304
+    alias entry   %51 = bitcast %struct.Comm* %41 to double*, !dbg !10304
+  alias entry   %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304
+  alias entry   %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304
+  alias entry   %69 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %63, i32 0, !dbg !10304
+  alias entry   %70 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %64, i32 0, !dbg !10304
+  alias entry   %71 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %65, i32 0, !dbg !10304
+  alias entry   %72 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %66, i32 0, !dbg !10304
+    alias entry   %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304
+    alias entry   %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304
+    alias entry   %75 = bitcast i64* %69 to <4 x i64>*, !dbg !10304
+    alias entry   %76 = bitcast i64* %70 to <4 x i64>*, !dbg !10304
+    alias entry   %77 = bitcast i64* %71 to <4 x i64>*, !dbg !10304
+    alias entry   %78 = bitcast i64* %72 to <4 x i64>*, !dbg !10304
+  alias entry   %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307
+  alias entry   %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307
+  alias entry   %99 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 0, !dbg !10307
+  alias entry   %100 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 0, !dbg !10307
+  alias entry   %101 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 0, !dbg !10307
+  alias entry   %102 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 0, !dbg !10307
+    alias entry   %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307
+    alias entry   %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307
+    alias entry   %105 = bitcast i64* %99 to <4 x i64>*, !dbg !10307
+    alias entry   %106 = bitcast i64* %100 to <4 x i64>*, !dbg !10307
+    alias entry   %107 = bitcast i64* %101 to <4 x i64>*, !dbg !10307
+    alias entry   %108 = bitcast i64* %102 to <4 x i64>*, !dbg !10307
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309
+  alias entry   %141 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 1, !dbg !10309
+  alias entry   %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 1, !dbg !10309
+  alias entry   %143 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 1, !dbg !10309
+  alias entry   %144 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 1, !dbg !10309
+  alias entry   %151 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309
+    alias entry   %152 = bitcast double* %151 to <4 x double>*, !dbg !10309
+  alias entry   %153 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309
+    alias entry   %154 = bitcast double* %153 to <4 x double>*, !dbg !10309
+  alias entry   %155 = getelementptr inbounds double, double* %141, i64 -1, !dbg !10309
+    alias entry   %156 = bitcast double* %155 to <4 x double>*, !dbg !10309
+  alias entry   %157 = getelementptr inbounds double, double* %142, i64 -1, !dbg !10309
+    alias entry   %158 = bitcast double* %157 to <4 x double>*, !dbg !10309
+  alias entry   %159 = getelementptr inbounds double, double* %143, i64 -1, !dbg !10309
+    alias entry   %160 = bitcast double* %159 to <4 x double>*, !dbg !10309
+  alias entry   %161 = getelementptr inbounds double, double* %144, i64 -1, !dbg !10309
+    alias entry   %162 = bitcast double* %161 to <4 x double>*, !dbg !10309
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 0, !dbg !10304
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 0, !dbg !10307
+  alias entry   %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 1, !dbg !10318
+  alias entry   %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 1, !dbg !10309
+Round 1
+Round end
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 3.317647e+01		  store: 3.317647e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 3.317647e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..14
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z16distCleanCWandCUlPdP4Comm
+Round 0
+    base alias entry   %17 = bitcast i8** %16 to double**, !dbg !10269
+    base alias entry   %19 = bitcast i8** %18 to double**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+    base alias entry   %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..18
+Round 0
+  alias entry   %30 = getelementptr inbounds double, double* %5, i64 %29, !dbg !10304
+  alias entry   %31 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %29, i32 0, !dbg !10309
+    alias entry   %34 = bitcast i64* %31 to i8*, !dbg !10299
+Round 1
+Round end
+    store (1.058333e+01) to double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..19
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_
+Round 0
+  alias entry   %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433
+  alias entry   %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449
+  alias entry   %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %301 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792
+    alias entry   %302 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to %"struct.std::__detail::_Hash_node"**, !dbg !11793
+    alias entry   %312 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836
+  alias entry   %314 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842
+    alias entry   %317 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to i8*, !dbg !11846
+  alias entry   %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0
+  alias entry   %321 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0
+  alias entry   %322 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0
+  alias entry   %323 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %324 = bitcast %"class.std::vector.0"* %323 to i64*
+  alias entry   %325 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %326 = bitcast i64** %325 to i64*
+  alias entry   %330 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0
+  alias entry   %331 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %332 = bitcast %"class.std::vector.0"* %331 to i64*
+  alias entry   %333 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %334 = bitcast i64** %333 to i64*
+  alias entry   %818 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393
+  alias entry   %819 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405
+    alias entry   %820 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to %"struct.std::_Rb_tree_node"**, !dbg !13405
+  alias entry   %826 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419
+  alias entry   %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+    base alias entry   %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+  alias entry   %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+    base alias entry   %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+  alias entry   %829 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437
+  alias entry   %830 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442
+  alias entry   %831 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447
+    alias entry   %832 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to %"struct.std::_Rb_tree_node"**, !dbg !13447
+  alias entry   %838 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452
+  alias entry   %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+    base alias entry   %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+  alias entry   %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+    base alias entry   %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+  alias entry   %841 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464
+    alias entry   %846 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to i64*
+    alias entry   %848 = bitcast %"struct.std::_Rb_tree_node_base"* %826 to %"struct.std::_Rb_tree_node"*
+    alias entry   %850 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to i64*
+    alias entry   %852 = bitcast %"struct.std::_Rb_tree_node_base"* %838 to %"struct.std::_Rb_tree_node"*
+    alias entry   %967 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %827, align 8, !dbg !14017, !tbaa !14018
+    alias entry   %1023 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %839, align 8, !dbg !14306, !tbaa !14018
+Round 1
+Round end
+    load (1.000000e+00) from i64* %4
+    load (9.999994e-01) from i64* %3
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999803e+00) from %"class.std::vector.0"* %6
+    load (1.999960e+01) from %"class.std::vector.0"* %6
+    load (6.249782e+00) from %"class.std::vector.0"* %5
+    load (1.249956e+01) from %"class.std::vector.0"* %5
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (1.999809e+01) from %"class.std::vector.0"* %8
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..22
+Round 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %4
+    load (3.200000e-01) from %"class.std::vector.0"* %6
+    load (1.020000e+01) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 1.020000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.24
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..25
+Round 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.29"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.27
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..28
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..30
+Round 0
+  alias entry   %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503
+  alias entry   %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %2
+    load (2.047500e+02) from %"class.std::vector.0"* %4
+    load (2.047500e+02) from %"class.std::vector.15"* %7
+    load (2.047500e+02) from %"class.std::vector.52"* %6
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %6
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %7
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z22createCommunityMPITypev
+Round 0
+Round end
+On function _Z23destroyCommunityMPITypev
+Round 0
+Round end
+On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii
+Round 0
+  alias entry   %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869
+  alias entry   %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050
+  alias entry   %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068
+    alias entry   %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068
+  alias entry   %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085
+  alias entry   %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %56 = bitcast %"class.std::vector.0"* %55 to i64*
+  alias entry   %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %58 = bitcast i64** %57 to i64*
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999994e-01) from %class.Graph* %0
+    load (9.999994e-01) from %"class.std::map"* %2
+    load (1.999985e+01) from %class.Graph* %0
+    load (1.999985e+01) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 4.199970e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::map"* %2
+  load: 9.999994e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..32
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.137255e-01) from %"class.std::vector.66"* %4
+    load (3.137255e-01) from %"class.std::vector.0"* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.66"* %4
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.34
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to i64**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..35
+Round 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %6
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..37
+Round 0
+  alias entry   %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (6.350000e+00) from %"class.std::vector.52"* %3
+    load (6.350000e+00) from %"class.std::vector.15"* %4
+    load (6.350000e+00) from i64* %5
+    load (2.047500e+02) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %3
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.111000e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii
+Round 0
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319
+  alias entry   %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485
+    alias entry   %52 = bitcast i64** %51 to i64*, !dbg !10485
+    alias entry   %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489
+  alias entry   %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517
+    alias entry   %72 = bitcast i64** %71 to i64*, !dbg !10517
+    alias entry   %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518
+    alias entry   %91 = bitcast %"class.std::vector.0"* %3 to i8**
+  alias entry   %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %99 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10612
+    alias entry   %100 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10612
+  alias entry   %129 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673
+    alias entry   %130 = bitcast i64** %129 to i64*, !dbg !10673
+    alias entry   %132 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674
+  alias entry   %148 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696
+    alias entry   %149 = bitcast i64** %148 to i64*, !dbg !10696
+    alias entry   %151 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697
+  alias entry   %191 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %251 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %310 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244
+  alias entry   %311 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245
+    alias entry   %312 = bitcast i64** %310 to i64*, !dbg !11249
+    alias entry   %314 = bitcast i64** %311 to i64*, !dbg !11250
+  alias entry   %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279
+  alias entry   %321 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280
+    alias entry   %322 = bitcast i64** %320 to i64*, !dbg !11284
+    alias entry   %324 = bitcast i64** %321 to i64*, !dbg !11285
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..39
+Round 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6
+    alias entry   %29 = bitcast %"class.std::vector.0"* %28 to i64*
+  alias entry   %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %31 = bitcast i64** %30 to i64*
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.988141e+02) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (1.590478e+03) from %"class.std::vector.29"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 9.741684e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %5
+  load: 1.590478e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.41
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..42
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi
+Round 0
+  alias entry   %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180
+  alias entry   %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380
+    alias entry   %86 = bitcast i64** %85 to i64*, !dbg !11380
+    alias entry   %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384
+    alias entry   %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392
+  alias entry   %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399
+  alias entry   %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402
+    alias entry   %100 = bitcast i64** %99 to i64*, !dbg !11402
+    alias entry   %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403
+    alias entry   %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410
+  alias entry   %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417
+  alias entry   %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424
+    alias entry   %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424
+    alias entry   %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428
+    alias entry   %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440
+Round 1
+Round end
+    load (9.999981e-01) from %class.Graph* %2
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..45
+Round 0
+Round end
+    call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6
+    call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7
+    call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8
+    call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9
+    call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10
+    call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11
+    call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12
+    call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.116667e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 1.058333e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %7
+  load: 1.822318e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %8
+  load: 9.516732e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %9
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 7.937500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %11
+  load: 2.245651e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %12
+  load: 5.291667e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %14
+  load: 5.291667e+00		  store: 5.291667e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..46
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %5
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %8
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %9
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %10
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %12
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..49
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from i64** %4
+    load (3.200000e-01) from i64** %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function main
+Round 0
+    base alias entry   %14 = alloca i8**, align 8
+    alias entry   %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335
+Round 1
+Round end
+  Frequency of i8** %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGGC2ElP19ompi_communicator_t
+Round 0
+  alias entry   %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266
+  alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+    base alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+  alias entry   %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279
+    alias entry   %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278
+  alias entry   %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282
+  alias entry   %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284
+  alias entry   %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287
+  alias entry   %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320
+    alias entry   %101 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278
+    alias entry   %172 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278
+  alias entry   %184 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579
+    alias entry   %191 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278
+Round 1
+Round end
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    store (3.437500e-01) to %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 8.906250e+00		  store: 6.843750e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGG8generateEbbi
+Round 0
+  alias entry   %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306
+  alias entry   %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592
+  alias entry   %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709
+  alias entry   %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828
+  alias entry   %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832
+  alias entry   %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836
+  alias entry   %430 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915
+  alias entry   %819 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101
+  alias entry   %895 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1233 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1536 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+Round 1
+Round end
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (6.249994e-01) from %class.GenerateRGG* %0
+    load (9.999990e-01) from %class.GenerateRGG* %0
+    load (4.999995e-01) from %class.GenerateRGG* %0
+    load (3.124994e-01) from %class.GenerateRGG* %0
+    load (9.999985e-01) from %class.GenerateRGG* %0
+    load (4.999993e-01) from %class.GenerateRGG* %0
+    load (3.124992e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (9.999923e-01) from %class.GenerateRGG* %0
+    load (9.999914e-01) from %class.GenerateRGG* %0
+    load (3.749968e-01) from %class.GenerateRGG* %0
+    load (3.749964e-01) from %class.GenerateRGG* %0
+    load (9.999890e-01) from %class.GenerateRGG* %0
+    load (9.998746e-01) from %class.GenerateRGG* %0
+    load (3.199362e+02) from %class.GenerateRGG* %0
+    load (3.199361e+02) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998698e-01) from %class.GenerateRGG* %0
+    load (4.999349e-01) from %class.GenerateRGG* %0
+    load (2.499674e-01) from %class.GenerateRGG* %0
+    load (7.997451e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (7.997448e+01) from %class.GenerateRGG* %0
+    load (4.999063e-01) from %class.GenerateRGG* %0
+    load (2.499531e-01) from %class.GenerateRGG* %0
+    load (7.996993e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (7.996991e+01) from %class.GenerateRGG* %0
+    load (9.998126e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998072e-01) from %class.GenerateRGG* %0
+    load (9.998015e-01) from %class.GenerateRGG* %0
+    load (6.248724e-01) from %class.GenerateRGG* %0
+    load (6.248718e-01) from %class.GenerateRGG* %0
+    load (1.952724e-01) from %class.GenerateRGG* %0
+    load (3.905445e-01) from %class.GenerateRGG* %0
+    load (3.905442e-01) from %class.GenerateRGG* %0
+    load (6.248393e-01) from %class.GenerateRGG* %0
+    load (1.249644e+01) from %class.GenerateRGG* %0
+    load (1.249643e+01) from %class.GenerateRGG* %0
+    load (1.171538e+00) from %class.GenerateRGG* %0
+    load (5.857690e-01) from %class.GenerateRGG* %0
+    load (2.928845e-01) from %class.GenerateRGG* %0
+    load (1.464422e-01) from %class.GenerateRGG* %0
+    load (6.248387e-01) from %class.GenerateRGG* %0
+    load (6.248381e-01) from %class.GenerateRGG* %0
+    load (1.249638e+01) from %class.GenerateRGG* %0
+    load (6.248253e-01) from %class.GenerateRGG* %0
+    load (3.905154e-01) from %class.GenerateRGG* %0
+    load (2.440719e-01) from %class.GenerateRGG* %0
+    load (6.248247e-01) from %class.GenerateRGG* %0
+    load (4.881438e+00) from %class.GenerateRGG* %0
+    load (9.997431e-01) from %class.GenerateRGG* %0
+    load (9.997421e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (6.248378e-01) from %class.GenerateRGG* %0
+    load (1.999481e+01) from %class.GenerateRGG* %0
+    load (9.997388e-01) from %class.GenerateRGG* %0
+    load (9.997385e-01) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 1.248245e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN14BinaryEdgeList4readEiiiSs
+Round 0
+  alias entry   %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380
+  alias entry   %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388
+  alias entry   %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514
+    alias entry   %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515
+  alias entry   %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518
+    alias entry   %105 = bitcast i64* %104 to i8*, !dbg !10519
+  alias entry   %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532
+  alias entry   %183 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605
+Round 1
+Round end
+    load (9.999971e-01) from %class.BinaryEdgeList* %0
+    load (9.999971e-01) from %"class.std::basic_string"* %4
+    load (6.249948e-01) from %class.BinaryEdgeList* %0
+    load (9.999905e-01) from %class.BinaryEdgeList* %0
+    store (9.999905e-01) to %class.BinaryEdgeList* %0
+    load (9.999895e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999729e-01) from %class.BinaryEdgeList* %0
+    store (9.999729e-01) to %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999547e-01) from %class.BinaryEdgeList* %0
+    load (1.999909e+01) from %class.BinaryEdgeList* %0
+  Frequency of %class.BinaryEdgeList* %0
+  load: 2.962391e+01		  store: 1.999963e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::basic_string"* %4
+  load: 9.999971e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E
+Round 0
+Round end
+Warning: wrong traversal order, or recursive call
+On function _ZN5GraphC2EllllP19ompi_communicator_t
+Round 0
+  alias entry   %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272
+  alias entry   %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272
+  alias entry   %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309
+    alias entry   %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309
+  alias entry   %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320
+  alias entry   %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272
+    alias entry   %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+    base alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+  alias entry   %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336
+    alias entry   %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361
+    alias entry   %29 = bitcast i64** %28 to i64*, !dbg !10361
+    alias entry   %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365
+  alias entry   %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416
+    alias entry   %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416
+    alias entry   %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420
+  alias entry   %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455
+    alias entry   %65 = bitcast i64** %64 to i64*, !dbg !10455
+    alias entry   %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0
+  alias entry   %111 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511
+  alias entry   %117 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547
+  alias entry   %123 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576
+Round 1
+Round end
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    load (9.999990e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+Warning: wrong traversal order, or recursive call
+On function _ZN3LCGC2EjPdlP19ompi_communicator_t
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268
+  alias entry   %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277
+  alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+    base alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+  alias entry   %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281
+    alias entry   %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300
+  alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+    base alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+  alias entry   %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306
+    alias entry   %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305
+  alias entry   %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309
+  alias entry   %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326
+    alias entry   %21 = bitcast i64** %20 to i64*, !dbg !10326
+    alias entry   %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330
+  alias entry   %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359
+  alias entry   %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374
+  alias entry   %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399
+    alias entry   %53 = bitcast i64* %52 to i8*, !dbg !10400
+    alias entry   %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305
+Round 1
+Round end
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    load (9.999989e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+Warning: wrong traversal order, or recursive call
+On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 1, !dbg !10267
+  alias entry   %8 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 0, !dbg !10279
+  alias entry   %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376
+Round 1
+Round end
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1
+    store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1
+Warning: wrong traversal order, or recursive call
+On function _ZNSt6vectorIlSaIlEEaSERKS1_
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10278
+    alias entry   %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285
+  alias entry   %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294
+    alias entry   %13 = bitcast i64** %12 to i64*, !dbg !10294
+    alias entry   %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10459
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490
+    alias entry   %42 = bitcast i64** %41 to i64*, !dbg !10490
+  alias entry   %53 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573
+  alias entry   %73 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    store (6.250000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.695312e+00		  store: 1.250000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 1.445312e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl
+Round 0
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281
+    alias entry   %9 = bitcast i64** %8 to i64*, !dbg !10281
+  alias entry   %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288
+    alias entry   %12 = bitcast i64** %11 to i64*, !dbg !10288
+    alias entry   %632 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728
+  alias entry   %848 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.382812e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 6.250000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %87 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375
+  alias entry   %108 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %115 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+  Frequency of %"class.std::vector.5"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN3LCG18parallel_prefix_opEv
+Round 0
+  alias entry   %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283
+  alias entry   %169 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10361
+  alias entry   %175 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269
+  alias entry   %179 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0
+  alias entry   %188 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10372
+  alias entry   %252 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10372
+Round 1
+Round end
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+  Frequency of %class.LCG* %0
+  load: 8.523529e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %65 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369
+  alias entry   %86 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %93 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag
+Round 0
+Round end
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag
+Round 0
+  alias entry   %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344
+    alias entry   %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344
+  alias entry   %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351
+    alias entry   %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351
+    alias entry   %120 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799
+  alias entry   %141 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %146 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.675781e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_
+Round 0
+Round end
+On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_
+Round 0
+Round end
+On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_
+Round 0
+Round end
+On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_
+Round 0
+Round end
+On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_
+Round 0
+Round end
+On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast i64** %8 to i64*, !dbg !10380
+    alias entry   %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast double** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast double** %8 to i64*, !dbg !10381
+    alias entry   %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460
+    alias entry   %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467
+    alias entry   %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+  Frequency of %"class.std::vector.15"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_
+Round 0
+Round end
+  Frequency of %"class.std::unordered_set"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530
+    alias entry   %77 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550
+    alias entry   %83 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618
+  alias entry   %87 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627
+    alias entry   %95 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628
+    base alias entry   %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10630
+  alias entry   %99 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639
+    alias entry   %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10640
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641
+  alias entry   %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641
+    alias entry   %104 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642
+  alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645
+    base alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645
+    base alias entry   %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10676
+    base alias entry   %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10678
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525
+  base alias offset entry (0)   %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10629, !tbaa !10511
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable"* %0
+  Frequency of %"class.std::_Hashtable"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490
+    alias entry   %77 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510
+    alias entry   %83 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578
+  alias entry   %87 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587
+    alias entry   %95 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588
+    base alias entry   %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10590
+  alias entry   %99 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599
+    alias entry   %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10600
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601
+  alias entry   %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601
+    alias entry   %104 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602
+  alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605
+    base alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605
+    base alias entry   %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10630
+    base alias entry   %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10632
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485
+  base alias offset entry (0)   %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10589, !tbaa !10471
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable.34"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable.34"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable.34"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable.34"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable.34"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable.34"* %0
+  Frequency of %"class.std::_Hashtable.34"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node.61"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273
+  alias entry   %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %59 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394
+    alias entry   %60 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395
+  alias entry   %81 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %89 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449
+    alias entry   %143 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+  Frequency of %"class.std::vector.52"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _GLOBAL__sub_I_main.cpp
+Round 0
+Round end
+On function .omp_offloading.descriptor_unreg
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda
+Round 0
+Round end
+  ---- Identify Target Regions ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+  ---- Target Distance Calculation ----
+_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations
+target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) 
+target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) 
+target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) 
+  ---- OMP (main.cpp, powerpc64le-unknown-linux-gnu) ----
+new entry   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+new entry   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+new entry   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+new entry   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+new entry   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+new entry   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+new entry   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+new entry   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+new entry   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Round 0
+  base alias entry   %130 = bitcast i64** %29 to i8**, !dbg !11450
+  base alias entry   %142 = bitcast i64** %30 to i8**, !dbg !11479
+  alias entry   %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487
+  alias entry   %158 = bitcast i8* %156 to double*, !dbg !11511
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias entry   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias entry   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias entry   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias entry   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias entry   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias entry   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias entry   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias entry   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias entry   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias entry   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias entry   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias entry   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias entry   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias entry   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias entry   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry (func arg) %struct.Comm* %1
+  alias entry (func arg) double* %2
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1
+Round 1
+  base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+  base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+  base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+  base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %29 = alloca i64*, align 8
+  base alias entry   %30 = alloca i64*, align 8
+  base alias offset entry (1)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias offset entry (2)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+  base alias offset entry (1)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (1)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (2)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (-1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-4)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-4)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (6)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-3)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (6)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (7)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-5)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-4)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (7)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-5)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (8)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-6)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-5)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-1)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (8)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-6)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-1)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-8)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-7)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-6)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-3)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-2)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-8)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-7)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-3)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-2)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (10)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-8)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-7)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-4)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-3)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-2)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (10)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-8)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-4)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-3)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-2)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-10)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-9)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-8)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-5)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-4)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-3)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-1)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-10)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-9)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-8)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-5)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-4)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-3)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-1)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry   %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %274 = ptrtoint i64* %263 to i64, !dbg !11584
+  alias entry   %275 = ptrtoint i64* %264 to i64, !dbg !11584
+  base alias entry   %215 = bitcast i8** %214 to i64*
+  base alias entry   %217 = bitcast i8** %216 to i64*
+  base alias entry   %220 = bitcast i8** %219 to i64*
+  base alias entry   %222 = bitcast i8** %221 to i64*
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2
+Warning: reach to function declaration __kmpc_fork_call
+Round 2
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %126 = bitcast i64** %29 to i8*, !dbg !11447
+  base alias entry   %139 = bitcast i64** %30 to i8*, !dbg !11477
+  base alias offset entry (1)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (2)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (1)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (2)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (1)   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias offset entry (1)   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias offset entry (1)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (2)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (3)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (6)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (7)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (8)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (10)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (1)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (2)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (6)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (7)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (8)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (10)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (1)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (2)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (5)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (6)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (7)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (9)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (1)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (5)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (6)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (7)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (9)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (4)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (5)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (6)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (8)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (4)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (5)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (6)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (8)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (4)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (5)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (7)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (4)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (5)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (7)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (4)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (6)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias entry   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (4)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (6)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias entry   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias entry   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (5)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias entry   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3
+Warning: reach to function declaration __kmpc_fork_call
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (4)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (5)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-2)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-2)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-3)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-4)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-3)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-4)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-5)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-4)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-5)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-6)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-5)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-6)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-7)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-6)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4
+Warning: reach to function declaration __kmpc_fork_call
+Round 4
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (5)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (4)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (5)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (4)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (3)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (4)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (3)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5
+Warning: reach to function declaration __kmpc_fork_call
+Round 5
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6
+Warning: reach to function declaration __kmpc_fork_call
+Round 6
+Warning: reach to function declaration __kmpc_fork_teams
+Round end
+  ---- Access Frequency Analysis ----
+  target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Frequency of   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Frequency of   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Frequency of   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Frequency of   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Frequency of   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Frequency of   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Frequency of   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+Frequency of   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Frequency of   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+  ---- Optimization Preparation ----
+Rank 9 for   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Rank 8 for   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Rank 7 for   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Rank 6 for   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+Rank 5 for   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Rank 4 for   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Rank 3 for   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Rank 2 for   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Rank 1 for   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+  ---- Data Mapping Optimization ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+@.offload_maptypes = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 33, i64 547, i64 33]
+  arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 1.600380e+02, 1.280304e+03 after adjustment;		    scaled local reuse is 0x500
+    reuse distance is 0x01
+  arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.600380e+02, 2.560608e+03 after adjustment;		    scaled local reuse is 0xa00
+    reuse distance is 0x01
+    map type changed: @.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465]
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x01
+    map type changed: @.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506]
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+@.offload_maptypes.47 = private unnamed_addr constant [12 x i64] [i64 800, i64 33, i64 33, i64 33, i64 33, i64 34, i64 33, i64 33, i64 35, i64 800, i64 35, i64 800]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+    size is   %90 = sub i64 %87, %89, !dbg !11386
+    global reuse is 0x05
+    local reuse is 1.015380e+01, 8.123040e+01 after adjustment;		    scaled local reuse is 0x051
+    reuse distance is 0x09
+  arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+    size is   %104 = sub i64 %101, %103, !dbg !11404
+    global reuse is 0x08
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+    size is   %118 = sub i64 %115, %117, !dbg !11430
+    global reuse is 0x02
+    local reuse is 8.757690e+01, 1.401230e+03 after adjustment;		    scaled local reuse is 0x579
+    reuse distance is 0x09
+  arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x03
+    local reuse is 4.569230e+01, 3.655384e+02 after adjustment;		    scaled local reuse is 0x16d
+    reuse distance is 0x09
+  arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x07
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x09
+    local reuse is 3.807690e+00, 3.046152e+01 after adjustment;		    scaled local reuse is 0x01e
+    reuse distance is 0x09
+  arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.078710e+02, 1.725936e+03 after adjustment;		    scaled local reuse is 0x6bd
+    reuse distance is 0x01
+  arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 2.538460e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+    map type changed: @.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800]
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 2.030760e+01, 3.249216e+02 after adjustment;		    scaled local reuse is 0x144
+    reuse distance is 0x07
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x07
+    map type changed: @.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161]
+1 warning generated.
+In file included from main.cpp:58:
+In file included from ./dspl_gpu_kernel.hpp:58:
+In file included from ./graph.hpp:56:
+./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value]
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^
+./utils.hpp:263:56: note: use function 'std::abs' instead
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^~~~
+                                                       std::abs
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+  ---- Function Argument Access Frequency CG Analysis ----
+On function __omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396
+Round 0
+  alias entry   %71 = getelementptr inbounds double, double* %2, i64 %68, !dbg !45
+  alias entry   %74 = getelementptr inbounds %struct.Comm, %struct.Comm* %4, i64 %68, i32 1, !dbg !52
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.600385e+02) from double* %2
+    load (1.600385e+02) from %struct.Comm* %4
+    load (6.227106e-02) from double* %1
+    store (6.227106e-02) to double* %1
+    load (6.227106e-02) from double* %3
+    store (6.227106e-02) to double* %3
+  Frequency of double* %1
+  load: 6.227106e-02		  store: 6.227106e-02 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.600385e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.227106e-02		  store: 6.227106e-02 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 1.600385e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436
+Round 0
+  alias entry   %41 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !45
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 0, !dbg !53
+  alias entry   %46 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !55
+  alias entry   %48 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 1, !dbg !57
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (5.076923e+00) from %struct.Comm* %2
+    load (5.076923e+00) from %struct.Comm* %1
+    store (5.076923e+00) to %struct.Comm* %1
+    load (5.076923e+00) from %struct.Comm* %2
+    load (5.076923e+00) from %struct.Comm* %1
+    store (5.076923e+00) to %struct.Comm* %1
+  Frequency of %struct.Comm* %1
+  load: 1.015385e+01		  store: 1.015385e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 1.015385e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455
+Round 0
+  alias entry   %41 = getelementptr inbounds double, double* %1, i64 %40, !dbg !45
+  alias entry   %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !52
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !57
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    store (5.076923e+00) to double* %1
+    store (5.076923e+00) to %struct.Comm* %2
+    store (5.076923e+00) to %struct.Comm* %2
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 5.076923e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 1.015385e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368
+Round 0
+Round end
+change loop scale from 32.0 to 1.0
+Warning: wrong traversal order, or recursive call
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %91 = getelementptr inbounds i64, i64* %2, i64 %90, !dbg !35
+  alias entry   %93 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !38
+  alias entry   %96 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !40
+  alias entry   %99 = getelementptr inbounds i64, i64* %1, i64 %98, !dbg !42
+  alias entry   %103 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 0, !dbg !45
+  alias entry   %105 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 1, !dbg !49
+    base alias entry   %178 = select i1 %119, %struct.Edge** %13, %struct.Edge** %177
+  alias entry   %188 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 0, !dbg !69
+  alias entry   %189 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 1, !dbg !70
+  alias entry   %198 = getelementptr inbounds i64, i64* %4, i64 %197, !dbg !77
+    alias entry   %239 = bitcast double* %189 to i64*, !dbg !109
+  alias entry   %282 = getelementptr inbounds double, double* %10, i64 %0, !dbg !122
+  alias entry   %286 = getelementptr inbounds double, double* %6, i64 %0, !dbg !125
+  alias entry   %307 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 1, !dbg !136
+  alias entry   %309 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 0, !dbg !137
+  alias entry   %355 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 1, !dbg !136
+  alias entry   %357 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 0, !dbg !137
+  alias entry   %403 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 1, !dbg !167
+    alias entry   %404 = bitcast double* %403 to i64*, !dbg !168
+  alias entry   %415 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 0, !dbg !170
+  alias entry   %417 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 1, !dbg !172
+    alias entry   %419 = bitcast double* %417 to i64*, !dbg !174
+  alias entry   %430 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 0, !dbg !176
+  alias entry   %434 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !179
+  alias entry   %462 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 1, !dbg !136
+  alias entry   %464 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 0, !dbg !137
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (8.000000e+00) from %struct.Edge* %3
+    load (4.000000e+00) from %struct.Edge* %3
+    load (8.000000e+00) from i64* %4
+    load (2.500000e+00) from %struct.Edge* %3
+    load (2.750000e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.236264e-01) from %struct.Comm* %7
+    load (1.236264e-01) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (5.000000e+00) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.725000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 9.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.124725e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+    base alias entry   %83 = select i1 %16, %struct.Edge** %12, %struct.Edge** %82
+  alias entry   %93 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 0, !dbg !38
+  alias entry   %94 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 1, !dbg !39
+  alias entry   %103 = getelementptr inbounds i64, i64* %7, i64 %102, !dbg !48
+  alias entry   %111 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %110, i32 0, !dbg !53
+  alias entry   %121 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, !dbg !61
+  alias entry   %131 = getelementptr inbounds double, double* %4, i64 %125, !dbg !70
+  alias entry   %138 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 1, !dbg !75
+  alias entry   %139 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 0, !dbg !76
+  alias entry   %146 = getelementptr inbounds double, double* %4, i64 %145, !dbg !83
+    alias entry   %147 = bitcast double* %146 to i64*, !dbg !84
+    alias entry   %148 = bitcast double* %94 to i64*, !dbg !85
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (5.000000e-01) from %struct.Edge* %6
+    load (2.472527e-01) from %struct.Edge* %6
+    load (5.000000e-01) from i64* %7
+    load (5.000000e-01) from i32* %3
+    load (5.076923e+00) from %struct.clmap_t* %2
+    load (3.076923e-01) from i32* %5
+    load (1.538462e-01) from %struct.Edge* %6
+    load (1.538462e-01) from double* %4
+    store (1.538462e-01) to double* %4
+    store (1.703297e-01) to %struct.clmap_t* %2
+    store (1.703297e-01) to %struct.clmap_t* %2
+    store (1.703297e-01) to i32* %3
+    load (3.406593e-01) from i32* %5
+    load (1.703297e-01) from %struct.Edge* %6
+    store (1.703297e-01) to double* %4
+    store (1.703297e-01) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 5.076923e+00		  store: 3.406593e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 5.000000e-01		  store: 1.703297e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 1.538462e-01		  store: 3.241758e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 6.483516e-01		  store: 1.703297e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 1.071429e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !36
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !43
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !46
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !48
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !52
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !62
+  alias entry   %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !43
+  alias entry   %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !46
+  alias entry   %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !52
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !43
+  alias entry   %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !46
+  alias entry   %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !52
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (3.125000e-01) from %struct.Comm* %5
+    load (3.125000e-01) from %struct.Comm* %5
+    load (1.562500e-01) from double* %2
+    load (3.125000e-01) from %struct.Comm* %5
+    load (3.125000e-01) from %struct.Comm* %5
+    load (1.562500e-01) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.437500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 1.750000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368
+Round 0
+Round end
+change loop scale from 32.0 to 1.0
+    call (5.076923e+00, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %1
+    call (5.076923e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %2
+    call (5.076923e+00, 1.725000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %3
+    call (5.076923e+00, 9.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %4
+    call (5.076923e+00, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (5.076923e+00, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %6
+    call (5.076923e+00, 2.124725e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %7
+    call (5.076923e+00, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %8
+    call (5.076923e+00, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %10
+  Frequency of i64* %1
+  load: 1.015385e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 5.076923e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 8.757692e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 4.569231e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 5.076923e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 3.807692e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 1.078707e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 2.538462e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 2.538462e+00		  store: 2.538462e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  ---- Identify Target Regions ----
+  ---- OMP (main.cpp, nvptx64-nvidia-cuda) ----
+Info: ignore malloc
+Info: ignore malloc
+Info: ignore malloc
+Round 0
+Round end
+  ---- Access Frequency Analysis ----
+  ---- Optimization Preparation ----
+  ---- Data Mapping Optimization ----
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+1 warning generated.
+  ---- Function Argument Access Frequency CG Analysis ----
+On function _Z7is_pwr2i
+Round 0
+Round end
+On function _Z8reseederj
+Round 0
+Round end
+On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_
+Round 0
+  alias entry   %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369
+    alias entry   %19 = bitcast i32** %18 to i64*, !dbg !10369
+    alias entry   %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376
+Round 1
+Round end
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+  Frequency of %"class.std::seed_seq"* %0
+  load: 1.254902e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z4lockv
+Round 0
+Round end
+On function _Z6unlockv
+Round 0
+Round end
+On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __clang_call_terminate
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined.
+Round 0
+  alias entry   %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %"class.std::vector.10"* %4
+    load (6.350000e+00) from %"class.std::vector.15"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 1.270000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %5
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t
+Round 0
+  alias entry   %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283
+    alias entry   %10 = bitcast double** %9 to i64*, !dbg !10283
+    alias entry   %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..2
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+    alias entry   %98 = bitcast double* %3 to i64*, !dbg !10325
+Round 1
+Round end
+    load (3.157895e-01) from %"class.std::vector.10"* %4
+    load (2.105263e-01) from double* %3
+    store (2.105263e-01) to double* %3
+    load (2.105263e-01) from double* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z12distInitCommRSt6vectorIlSaIlEES2_l
+Round 0
+  alias entry   %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273
+    alias entry   %7 = bitcast i64** %6 to i64*, !dbg !10273
+    alias entry   %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+  Frequency of %"class.std::vector.0"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..4
+Round 0
+  alias entry   %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi
+Round 0
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485
+  alias entry   %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514
+  alias entry   %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532
+    alias entry   %25 = bitcast double** %24 to i64*, !dbg !10532
+    alias entry   %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536
+  alias entry   %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572
+    alias entry   %41 = bitcast i64** %40 to i64*, !dbg !10572
+    alias entry   %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574
+  alias entry   %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600
+    alias entry   %57 = bitcast i64** %56 to i64*, !dbg !10600
+    alias entry   %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601
+  alias entry   %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622
+    alias entry   %73 = bitcast double** %72 to i64*, !dbg !10622
+    alias entry   %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654
+    alias entry   %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654
+    alias entry   %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685
+    alias entry   %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685
+    alias entry   %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+Warning: wrong traversal order, or recursive call
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352
+  alias entry   %80 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 1, !dbg !10330
+  alias entry   %82 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 0, !dbg !10333
+  alias entry   %88 = getelementptr inbounds double, double* %2, i64 %85, !dbg !10340
+  alias entry   %124 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 1, !dbg !10330
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 0, !dbg !10333
+  alias entry   %132 = getelementptr inbounds double, double* %2, i64 %129, !dbg !10340
+Round 1
+Round end
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.109375e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 4.043750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+  alias entry   %20 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 0, !dbg !10308
+  alias entry   %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 1, !dbg !10310
+  alias entry   %30 = getelementptr inbounds i64, i64* %7, i64 %29, !dbg !10326
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10337
+  alias entry   %45 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, !dbg !10348
+  alias entry   %55 = getelementptr inbounds double, double* %4, i64 %49, !dbg !10358
+  alias entry   %61 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 0, !dbg !10364
+  alias entry   %62 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 1, !dbg !10367
+    alias entry   %68 = bitcast double* %21 to i64*, !dbg !10375
+  alias entry   %71 = getelementptr inbounds double, double* %4, i64 %70, !dbg !10377
+    alias entry   %72 = bitcast double* %71 to i64*, !dbg !10378
+Round 1
+Round end
+    load (1.593750e+01) from %struct.Edge* %6
+    load (7.937500e+00) from %struct.Edge* %6
+    load (1.593750e+01) from i64* %7
+    load (1.593750e+01) from i32* %3
+    load (1.625000e+02) from %struct.clmap_t* %2
+    load (9.937500e+00) from i32* %5
+    load (4.937500e+00) from %struct.Edge* %6
+    load (4.937500e+00) from double* %4
+    store (4.937500e+00) to double* %4
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to i32* %3
+    load (1.093750e+01) from i32* %5
+    load (5.437500e+00) from %struct.Edge* %6
+    store (5.437500e+00) to double* %4
+    store (5.437500e+00) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 1.625000e+02		  store: 1.087500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.593750e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 4.937500e+00		  store: 1.037500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 2.087500e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 3.425000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 1.593750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316
+  alias entry   %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322
+  alias entry   %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329
+  alias entry   %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337
+  alias entry   %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341
+  alias entry   %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401
+  alias entry   %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403
+  alias entry   %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414
+    alias entry   %93 = bitcast double* %48 to i64*, !dbg !10457
+  alias entry   %116 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470
+  alias entry   %120 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473
+  alias entry   %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 1, !dbg !10533
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 0, !dbg !10534
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 1, !dbg !10533
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 0, !dbg !10534
+  alias entry   %230 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 1, !dbg !10572
+    alias entry   %231 = bitcast double* %230 to i64*, !dbg !10573
+  alias entry   %242 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 0, !dbg !10575
+  alias entry   %244 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578
+    alias entry   %246 = bitcast double* %244 to i64*, !dbg !10581
+  alias entry   %257 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583
+  alias entry   %261 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587
+  alias entry   %264 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 1, !dbg !10533
+  alias entry   %266 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 0, !dbg !10534
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (7.992188e+00) from %struct.Edge* %3
+    load (3.992188e+00) from %struct.Edge* %3
+    load (7.992188e+00) from i64* %4
+    load (2.492188e+00) from %struct.Edge* %3
+    load (2.742188e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.250000e-01) from %struct.Comm* %7
+    load (1.250000e-01) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.721875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 8.992188e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.121875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21distComputeModularityRK5GraphP4CommPKddi
+Round 0
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304
+    base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+    base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+    base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+    base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Round 1
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 2
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Round 4
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.7
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..8
+Round 0
+  alias entry   %39 = getelementptr inbounds double, double* %6, i64 %38, !dbg !10318
+  alias entry   %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %38, i32 1, !dbg !10321
+    alias entry   %62 = bitcast double* %5 to i64*, !dbg !10329
+    alias entry   %74 = bitcast double* %7 to i64*, !dbg !10329
+Round 1
+Round end
+    load (1.010526e+01) from double* %6
+    load (1.010526e+01) from %struct.Comm* %8
+    load (2.105263e-01) from double* %5
+    store (2.105263e-01) to double* %5
+    load (2.105263e-01) from double* %7
+    store (2.105263e-01) to double* %7
+    load (2.105263e-01) from double* %5
+    load (2.105263e-01) from double* %7
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %7
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.9
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..10
+Round 0
+    alias entry   %65 = bitcast double* %3 to i64*, !dbg !10310
+    alias entry   %77 = bitcast double* %5 to i64*, !dbg !10310
+Round 1
+Round end
+    load (2.916667e-01) from double* %3
+    store (2.916667e-01) to double* %3
+    load (2.916667e-01) from double* %5
+    store (2.916667e-01) to double* %5
+    load (3.333333e-01) from double* %3
+    load (3.333333e-01) from double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z20distUpdateLocalCinfolP4CommPKS_
+Round 0
+    base alias entry   %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269
+    base alias entry   %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269
+    base alias entry   %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..13
+Round 0
+  alias entry   %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304
+  alias entry   %34 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %25, i32 1, !dbg !10304
+  alias entry   %35 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304
+  alias entry   %36 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %25, i32 1, !dbg !10304
+  alias entry   %37 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304
+  alias entry   %38 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %29, !dbg !10304
+  alias entry   %39 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304
+  alias entry   %40 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %29, !dbg !10304
+    alias entry   %41 = bitcast double* %36 to %struct.Comm*, !dbg !10304
+    alias entry   %43 = bitcast double* %34 to %struct.Comm*, !dbg !10304
+    alias entry   %46 = bitcast %struct.Comm* %40 to double*, !dbg !10304
+    alias entry   %48 = bitcast %struct.Comm* %38 to double*, !dbg !10304
+  alias entry   %63 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %57, i32 0, !dbg !10304
+  alias entry   %64 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %58, i32 0, !dbg !10304
+  alias entry   %65 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %59, i32 0, !dbg !10304
+  alias entry   %66 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %60, i32 0, !dbg !10304
+  alias entry   %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304
+  alias entry   %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304
+    alias entry   %69 = bitcast i64* %63 to <4 x i64>*, !dbg !10304
+    alias entry   %70 = bitcast i64* %64 to <4 x i64>*, !dbg !10304
+    alias entry   %71 = bitcast i64* %65 to <4 x i64>*, !dbg !10304
+    alias entry   %72 = bitcast i64* %66 to <4 x i64>*, !dbg !10304
+    alias entry   %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304
+    alias entry   %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304
+  alias entry   %93 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 0, !dbg !10307
+  alias entry   %94 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 0, !dbg !10307
+  alias entry   %95 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 0, !dbg !10307
+  alias entry   %96 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 0, !dbg !10307
+  alias entry   %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307
+  alias entry   %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307
+    alias entry   %99 = bitcast i64* %93 to <4 x i64>*, !dbg !10307
+    alias entry   %100 = bitcast i64* %94 to <4 x i64>*, !dbg !10307
+    alias entry   %101 = bitcast i64* %95 to <4 x i64>*, !dbg !10307
+    alias entry   %102 = bitcast i64* %96 to <4 x i64>*, !dbg !10307
+    alias entry   %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307
+    alias entry   %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307
+  alias entry   %135 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 1, !dbg !10309
+  alias entry   %136 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 1, !dbg !10309
+  alias entry   %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 1, !dbg !10309
+  alias entry   %138 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 1, !dbg !10309
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309
+  alias entry   %147 = getelementptr inbounds double, double* %135, i64 -1, !dbg !10309
+    alias entry   %148 = bitcast double* %147 to <4 x double>*, !dbg !10309
+  alias entry   %149 = getelementptr inbounds double, double* %136, i64 -1, !dbg !10309
+    alias entry   %150 = bitcast double* %149 to <4 x double>*, !dbg !10309
+  alias entry   %151 = getelementptr inbounds double, double* %137, i64 -1, !dbg !10309
+    alias entry   %152 = bitcast double* %151 to <4 x double>*, !dbg !10309
+  alias entry   %153 = getelementptr inbounds double, double* %138, i64 -1, !dbg !10309
+    alias entry   %154 = bitcast double* %153 to <4 x double>*, !dbg !10309
+  alias entry   %155 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309
+    alias entry   %156 = bitcast double* %155 to <4 x double>*, !dbg !10309
+  alias entry   %157 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309
+    alias entry   %158 = bitcast double* %157 to <4 x double>*, !dbg !10309
+  alias entry   %178 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 0, !dbg !10304
+  alias entry   %180 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 0, !dbg !10307
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 1, !dbg !10318
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 1, !dbg !10309
+Round 1
+Round end
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 3.317647e+01		  store: 3.317647e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 3.317647e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..14
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z16distCleanCWandCUlPdP4Comm
+Round 0
+    base alias entry   %17 = bitcast i8** %16 to double**, !dbg !10269
+    base alias entry   %19 = bitcast i8** %18 to double**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+    base alias entry   %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..18
+Round 0
+  alias entry   %29 = getelementptr inbounds double, double* %5, i64 %28, !dbg !10304
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %28, i32 0, !dbg !10309
+    alias entry   %33 = bitcast i64* %30 to i8*, !dbg !10299
+Round 1
+Round end
+    store (1.058333e+01) to double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..19
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_
+Round 0
+  alias entry   %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433
+  alias entry   %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449
+  alias entry   %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %299 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792
+    alias entry   %300 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to %"struct.std::__detail::_Hash_node"**, !dbg !11793
+    alias entry   %308 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836
+  alias entry   %310 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842
+    alias entry   %313 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to i8*, !dbg !11846
+  alias entry   %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0
+  alias entry   %317 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0
+  alias entry   %318 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0
+  alias entry   %319 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %320 = bitcast %"class.std::vector.0"* %319 to i64*
+  alias entry   %321 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %322 = bitcast i64** %321 to i64*
+  alias entry   %325 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0
+  alias entry   %326 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %327 = bitcast %"class.std::vector.0"* %326 to i64*
+  alias entry   %328 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %329 = bitcast i64** %328 to i64*
+  alias entry   %800 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393
+  alias entry   %801 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405
+    alias entry   %802 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to %"struct.std::_Rb_tree_node"**, !dbg !13405
+  alias entry   %808 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419
+  alias entry   %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+    base alias entry   %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+  alias entry   %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+    base alias entry   %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+  alias entry   %811 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437
+  alias entry   %812 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442
+  alias entry   %813 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447
+    alias entry   %814 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to %"struct.std::_Rb_tree_node"**, !dbg !13447
+  alias entry   %820 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452
+  alias entry   %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+    base alias entry   %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+  alias entry   %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+    base alias entry   %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+  alias entry   %823 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464
+    alias entry   %828 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to i64*
+    alias entry   %830 = bitcast %"struct.std::_Rb_tree_node_base"* %808 to %"struct.std::_Rb_tree_node"*
+    alias entry   %832 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to i64*
+    alias entry   %834 = bitcast %"struct.std::_Rb_tree_node_base"* %820 to %"struct.std::_Rb_tree_node"*
+    alias entry   %943 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %809, align 8, !dbg !14017, !tbaa !14018
+    alias entry   %998 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %821, align 8, !dbg !14306, !tbaa !14018
+Round 1
+Round end
+    load (1.000000e+00) from i64* %4
+    load (9.999994e-01) from i64* %3
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999803e+00) from %"class.std::vector.0"* %6
+    load (1.999960e+01) from %"class.std::vector.0"* %6
+    load (6.249782e+00) from %"class.std::vector.0"* %5
+    load (1.249956e+01) from %"class.std::vector.0"* %5
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (1.999809e+01) from %"class.std::vector.0"* %8
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..22
+Round 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %4
+    load (3.200000e-01) from %"class.std::vector.0"* %6
+    load (1.020000e+01) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 1.020000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.24
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..25
+Round 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.29"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.27
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..28
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..30
+Round 0
+  alias entry   %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503
+  alias entry   %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %2
+    load (2.047500e+02) from %"class.std::vector.0"* %4
+    load (2.047500e+02) from %"class.std::vector.15"* %7
+    load (2.047500e+02) from %"class.std::vector.52"* %6
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %6
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %7
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z22createCommunityMPITypev
+Round 0
+Round end
+On function _Z23destroyCommunityMPITypev
+Round 0
+Round end
+On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii
+Round 0
+  alias entry   %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869
+  alias entry   %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050
+  alias entry   %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068
+    alias entry   %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068
+  alias entry   %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085
+  alias entry   %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %56 = bitcast %"class.std::vector.0"* %55 to i64*
+  alias entry   %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %58 = bitcast i64** %57 to i64*
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999994e-01) from %class.Graph* %0
+    load (9.999994e-01) from %"class.std::map"* %2
+    load (1.999985e+01) from %class.Graph* %0
+    load (1.999985e+01) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 4.199970e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::map"* %2
+  load: 9.999994e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..32
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.137255e-01) from %"class.std::vector.66"* %4
+    load (3.137255e-01) from %"class.std::vector.0"* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.66"* %4
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.34
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to i64**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..35
+Round 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %6
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..37
+Round 0
+  alias entry   %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (6.350000e+00) from %"class.std::vector.52"* %3
+    load (6.350000e+00) from %"class.std::vector.15"* %4
+    load (6.350000e+00) from i64* %5
+    load (2.047500e+02) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %3
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.111000e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii
+Round 0
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319
+  alias entry   %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485
+    alias entry   %52 = bitcast i64** %51 to i64*, !dbg !10485
+    alias entry   %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489
+  alias entry   %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517
+    alias entry   %72 = bitcast i64** %71 to i64*, !dbg !10517
+    alias entry   %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518
+    alias entry   %91 = bitcast %"class.std::vector.0"* %3 to i8**
+  alias entry   %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %98 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10598
+    alias entry   %99 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10598
+  alias entry   %128 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673
+    alias entry   %129 = bitcast i64** %128 to i64*, !dbg !10673
+    alias entry   %131 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674
+  alias entry   %147 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696
+    alias entry   %148 = bitcast i64** %147 to i64*, !dbg !10696
+    alias entry   %150 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %249 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %306 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244
+  alias entry   %307 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245
+    alias entry   %308 = bitcast i64** %306 to i64*, !dbg !11249
+    alias entry   %310 = bitcast i64** %307 to i64*, !dbg !11250
+  alias entry   %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279
+  alias entry   %317 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280
+    alias entry   %318 = bitcast i64** %316 to i64*, !dbg !11284
+    alias entry   %320 = bitcast i64** %317 to i64*, !dbg !11285
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..39
+Round 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6
+    alias entry   %29 = bitcast %"class.std::vector.0"* %28 to i64*
+  alias entry   %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %31 = bitcast i64** %30 to i64*
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.988141e+02) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (1.590478e+03) from %"class.std::vector.29"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 9.741684e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %5
+  load: 1.590478e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.41
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..42
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi
+Round 0
+  alias entry   %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180
+  alias entry   %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380
+    alias entry   %86 = bitcast i64** %85 to i64*, !dbg !11380
+    alias entry   %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384
+    alias entry   %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392
+  alias entry   %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399
+  alias entry   %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402
+    alias entry   %100 = bitcast i64** %99 to i64*, !dbg !11402
+    alias entry   %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403
+    alias entry   %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410
+  alias entry   %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417
+  alias entry   %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424
+    alias entry   %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424
+    alias entry   %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428
+    alias entry   %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440
+Round 1
+Round end
+    load (9.999981e-01) from %class.Graph* %2
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..45
+Round 0
+Round end
+    call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6
+    call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7
+    call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8
+    call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9
+    call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10
+    call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11
+    call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12
+    call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.116667e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 1.058333e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %7
+  load: 1.822318e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %8
+  load: 9.516732e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %9
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 7.937500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %11
+  load: 2.245651e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %12
+  load: 5.291667e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %14
+  load: 5.291667e+00		  store: 5.291667e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..46
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %5
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %8
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %9
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %10
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %12
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..49
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from i64** %4
+    load (3.200000e-01) from i64** %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function main
+Round 0
+    base alias entry   %14 = alloca i8**, align 8
+    alias entry   %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335
+Round 1
+Round end
+  Frequency of i8** %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGGC2ElP19ompi_communicator_t
+Round 0
+  alias entry   %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266
+  alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+    base alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+  alias entry   %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279
+    alias entry   %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278
+  alias entry   %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282
+  alias entry   %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284
+  alias entry   %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287
+  alias entry   %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320
+    alias entry   %100 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278
+    alias entry   %171 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278
+  alias entry   %183 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579
+    alias entry   %190 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278
+Round 1
+Round end
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    store (3.437500e-01) to %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 8.531250e+00		  store: 6.843750e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGG8generateEbbi
+Round 0
+  alias entry   %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306
+  alias entry   %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592
+  alias entry   %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709
+  alias entry   %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828
+  alias entry   %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832
+  alias entry   %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836
+  alias entry   %362 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915
+  alias entry   %696 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101
+  alias entry   %772 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1095 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1388 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+Round 1
+Round end
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (6.249994e-01) from %class.GenerateRGG* %0
+    load (9.999990e-01) from %class.GenerateRGG* %0
+    load (4.999995e-01) from %class.GenerateRGG* %0
+    load (3.124994e-01) from %class.GenerateRGG* %0
+    load (9.999985e-01) from %class.GenerateRGG* %0
+    load (4.999993e-01) from %class.GenerateRGG* %0
+    load (3.124992e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (9.999923e-01) from %class.GenerateRGG* %0
+    load (9.999914e-01) from %class.GenerateRGG* %0
+    load (3.749968e-01) from %class.GenerateRGG* %0
+    load (3.749964e-01) from %class.GenerateRGG* %0
+    load (9.999890e-01) from %class.GenerateRGG* %0
+    load (9.998746e-01) from %class.GenerateRGG* %0
+    load (3.199362e+02) from %class.GenerateRGG* %0
+    load (3.199361e+02) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998698e-01) from %class.GenerateRGG* %0
+    load (4.999349e-01) from %class.GenerateRGG* %0
+    load (2.499674e-01) from %class.GenerateRGG* %0
+    load (7.997451e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (7.997448e+01) from %class.GenerateRGG* %0
+    load (4.999063e-01) from %class.GenerateRGG* %0
+    load (2.499531e-01) from %class.GenerateRGG* %0
+    load (7.996993e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (7.996991e+01) from %class.GenerateRGG* %0
+    load (9.998126e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998072e-01) from %class.GenerateRGG* %0
+    load (9.998015e-01) from %class.GenerateRGG* %0
+    load (6.248724e-01) from %class.GenerateRGG* %0
+    load (6.248718e-01) from %class.GenerateRGG* %0
+    load (1.952724e-01) from %class.GenerateRGG* %0
+    load (3.905445e-01) from %class.GenerateRGG* %0
+    load (3.905442e-01) from %class.GenerateRGG* %0
+    load (6.248393e-01) from %class.GenerateRGG* %0
+    load (1.249644e+01) from %class.GenerateRGG* %0
+    load (1.249643e+01) from %class.GenerateRGG* %0
+    load (1.171538e+00) from %class.GenerateRGG* %0
+    load (5.857690e-01) from %class.GenerateRGG* %0
+    load (2.928845e-01) from %class.GenerateRGG* %0
+    load (1.464422e-01) from %class.GenerateRGG* %0
+    load (6.248387e-01) from %class.GenerateRGG* %0
+    load (6.248381e-01) from %class.GenerateRGG* %0
+    load (1.249638e+01) from %class.GenerateRGG* %0
+    load (6.248253e-01) from %class.GenerateRGG* %0
+    load (3.905154e-01) from %class.GenerateRGG* %0
+    load (2.440719e-01) from %class.GenerateRGG* %0
+    load (6.248247e-01) from %class.GenerateRGG* %0
+    load (4.881438e+00) from %class.GenerateRGG* %0
+    load (9.997431e-01) from %class.GenerateRGG* %0
+    load (9.997421e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (1.999481e+01) from %class.GenerateRGG* %0
+    load (9.997388e-01) from %class.GenerateRGG* %0
+    load (9.997385e-01) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 1.246995e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN14BinaryEdgeList4readEiiiSs
+Round 0
+  alias entry   %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380
+  alias entry   %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388
+  alias entry   %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514
+    alias entry   %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515
+  alias entry   %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518
+    alias entry   %105 = bitcast i64* %104 to i8*, !dbg !10519
+  alias entry   %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532
+  alias entry   %182 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605
+Round 1
+Round end
+    load (9.999971e-01) from %class.BinaryEdgeList* %0
+    load (9.999971e-01) from %"class.std::basic_string"* %4
+    load (6.249948e-01) from %class.BinaryEdgeList* %0
+    load (9.999905e-01) from %class.BinaryEdgeList* %0
+    store (9.999905e-01) to %class.BinaryEdgeList* %0
+    load (9.999895e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999729e-01) from %class.BinaryEdgeList* %0
+    store (9.999729e-01) to %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999547e-01) from %class.BinaryEdgeList* %0
+    load (1.999909e+01) from %class.BinaryEdgeList* %0
+  Frequency of %class.BinaryEdgeList* %0
+  load: 2.962391e+01		  store: 1.999963e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::basic_string"* %4
+  load: 9.999971e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E
+Round 0
+Round end
+Warning: wrong traversal order, or recursive call
+On function _ZN5GraphC2EllllP19ompi_communicator_t
+Round 0
+  alias entry   %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272
+  alias entry   %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272
+  alias entry   %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309
+    alias entry   %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309
+  alias entry   %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320
+  alias entry   %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272
+    alias entry   %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+    base alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+  alias entry   %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336
+    alias entry   %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361
+    alias entry   %29 = bitcast i64** %28 to i64*, !dbg !10361
+    alias entry   %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365
+  alias entry   %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416
+    alias entry   %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416
+    alias entry   %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420
+  alias entry   %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455
+    alias entry   %65 = bitcast i64** %64 to i64*, !dbg !10455
+    alias entry   %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0
+  alias entry   %110 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511
+  alias entry   %116 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547
+  alias entry   %122 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576
+Round 1
+Round end
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    load (9.999990e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+Warning: wrong traversal order, or recursive call
+On function _ZN3LCGC2EjPdlP19ompi_communicator_t
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268
+  alias entry   %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277
+  alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+    base alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+  alias entry   %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281
+    alias entry   %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300
+  alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+    base alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+  alias entry   %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306
+    alias entry   %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305
+  alias entry   %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309
+  alias entry   %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326
+    alias entry   %21 = bitcast i64** %20 to i64*, !dbg !10326
+    alias entry   %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330
+  alias entry   %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359
+  alias entry   %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374
+  alias entry   %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399
+    alias entry   %53 = bitcast i64* %52 to i8*, !dbg !10400
+    alias entry   %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305
+Round 1
+Round end
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    load (9.999989e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+Warning: wrong traversal order, or recursive call
+On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 1, !dbg !10267
+  alias entry   %8 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 0, !dbg !10279
+  alias entry   %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376
+Round 1
+Round end
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1
+    store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1
+Warning: wrong traversal order, or recursive call
+On function _ZNSt6vectorIlSaIlEEaSERKS1_
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10278
+    alias entry   %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285
+  alias entry   %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294
+    alias entry   %13 = bitcast i64** %12 to i64*, !dbg !10294
+    alias entry   %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296
+  alias entry   %.phi.trans.insert = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10460
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490
+    alias entry   %43 = bitcast i64** %42 to i64*, !dbg !10490
+  alias entry   %54 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573
+  alias entry   %74 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633
+  alias entry   %77 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    store (6.250000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.578125e+00		  store: 1.250000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 1.445312e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl
+Round 0
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281
+    alias entry   %9 = bitcast i64** %8 to i64*, !dbg !10281
+  alias entry   %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288
+    alias entry   %12 = bitcast i64** %11 to i64*, !dbg !10288
+    alias entry   %543 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728
+  alias entry   %729 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.382812e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 6.250000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %83 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %111 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+  Frequency of %"class.std::vector.5"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN3LCG18parallel_prefix_opEv
+Round 0
+  alias entry   %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283
+  alias entry   %168 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10362
+  alias entry   %174 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269
+  alias entry   %178 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0
+  alias entry   %186 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10373
+  alias entry   %250 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10373
+Round 1
+Round end
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+  Frequency of %class.LCG* %0
+  load: 8.523529e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %60 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369
+  alias entry   %81 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag
+Round 0
+Round end
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag
+Round 0
+  alias entry   %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344
+    alias entry   %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344
+  alias entry   %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351
+    alias entry   %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351
+    alias entry   %116 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799
+  alias entry   %137 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %142 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.675781e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_
+Round 0
+Round end
+On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_
+Round 0
+Round end
+On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_
+Round 0
+Round end
+On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_
+Round 0
+Round end
+On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_
+Round 0
+Round end
+On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast i64** %8 to i64*, !dbg !10380
+    alias entry   %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast double** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast double** %8 to i64*, !dbg !10381
+    alias entry   %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460
+    alias entry   %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467
+    alias entry   %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+  Frequency of %"class.std::vector.15"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_
+Round 0
+Round end
+  Frequency of %"class.std::unordered_set"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530
+    alias entry   %76 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550
+    alias entry   %82 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618
+  alias entry   %86 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627
+    alias entry   %94 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628
+    base alias entry   %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10630
+  alias entry   %98 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639
+    alias entry   %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10640
+  alias entry   %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641
+    alias entry   %103 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642
+  alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645
+    base alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645
+    base alias entry   %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10676
+    base alias entry   %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10678
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525
+  base alias offset entry (0)   %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10629, !tbaa !10511
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable"* %0
+  Frequency of %"class.std::_Hashtable"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490
+    alias entry   %76 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510
+    alias entry   %82 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578
+  alias entry   %86 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587
+    alias entry   %94 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588
+    base alias entry   %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10590
+  alias entry   %98 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599
+    alias entry   %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10600
+  alias entry   %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601
+    alias entry   %103 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602
+  alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605
+    base alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605
+    base alias entry   %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10630
+    base alias entry   %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10632
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485
+  base alias offset entry (0)   %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10589, !tbaa !10471
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable.34"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable.34"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable.34"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable.34"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable.34"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable.34"* %0
+  Frequency of %"class.std::_Hashtable.34"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node.61"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273
+  alias entry   %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %54 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394
+    alias entry   %55 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %84 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449
+    alias entry   %133 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+  Frequency of %"class.std::vector.52"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _GLOBAL__sub_I_main.cpp
+Round 0
+Round end
+On function .omp_offloading.descriptor_unreg
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda
+Round 0
+Round end
+  ---- Identify Target Regions ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+  ---- Target Distance Calculation ----
+_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations
+target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) 
+target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) 
+target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) 
+  ---- OMP (/tmp/main-7b3dc0.bc, powerpc64le-unknown-linux-gnu) ----
+new entry   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+new entry   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+new entry   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+new entry   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+new entry   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+new entry   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+new entry   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+new entry   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+new entry   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Round 0
+  base alias entry   %130 = bitcast i64** %29 to i8**, !dbg !11450
+  base alias entry   %142 = bitcast i64** %30 to i8**, !dbg !11479
+  alias entry   %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487
+  alias entry   %158 = bitcast i8* %156 to double*, !dbg !11511
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias entry   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias entry   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias entry   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias entry   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias entry   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias entry   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias entry   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias entry   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias entry   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias entry   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias entry   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias entry   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias entry   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias entry   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias entry   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry (func arg) %struct.Comm* %1
+  alias entry (func arg) double* %2
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1
+Round 1
+  base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+  base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+  base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+  base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %29 = alloca i64*, align 8
+  base alias entry   %30 = alloca i64*, align 8
+  base alias offset entry (1)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias offset entry (2)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+  base alias offset entry (1)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (1)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (2)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (-1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-4)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-4)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (6)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-3)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (6)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (7)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-5)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-4)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (7)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-5)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (8)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-6)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-5)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-1)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (8)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-6)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-1)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-8)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-7)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-6)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-3)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-2)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-8)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-7)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-3)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-2)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (10)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-8)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-7)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-4)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-3)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-2)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (10)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-8)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-4)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-3)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-2)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-10)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-9)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-8)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-5)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-4)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-3)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-1)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-10)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-9)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-8)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-5)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-4)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-3)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-1)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry   %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %274 = ptrtoint i64* %263 to i64, !dbg !11584
+  alias entry   %275 = ptrtoint i64* %264 to i64, !dbg !11584
+  base alias entry   %215 = bitcast i8** %214 to i64*
+  base alias entry   %217 = bitcast i8** %216 to i64*
+  base alias entry   %220 = bitcast i8** %219 to i64*
+  base alias entry   %222 = bitcast i8** %221 to i64*
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2
+Warning: reach to function declaration __kmpc_fork_call
+Round 2
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %126 = bitcast i64** %29 to i8*, !dbg !11447
+  base alias entry   %139 = bitcast i64** %30 to i8*, !dbg !11477
+  base alias offset entry (1)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (2)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (1)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (2)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (1)   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias offset entry (1)   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias offset entry (1)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (2)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (3)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (6)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (7)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (8)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (10)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (1)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (2)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (6)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (7)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (8)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (10)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (1)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (2)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (5)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (6)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (7)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (9)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (1)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (5)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (6)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (7)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (9)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (4)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (5)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (6)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (8)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (4)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (5)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (6)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (8)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (4)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (5)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (7)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (4)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (5)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (7)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (4)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (6)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias entry   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (4)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (6)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias entry   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias entry   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (5)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias entry   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3
+Warning: reach to function declaration __kmpc_fork_call
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (4)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (5)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-2)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-2)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-3)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-4)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-3)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-4)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-5)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-4)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-5)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-6)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-5)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-6)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-7)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-6)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4
+Warning: reach to function declaration __kmpc_fork_call
+Round 4
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (5)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (4)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (5)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (4)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (3)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (4)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (3)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5
+Warning: reach to function declaration __kmpc_fork_call
+Round 5
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6
+Warning: reach to function declaration __kmpc_fork_call
+Round 6
+Warning: reach to function declaration __kmpc_fork_teams
+Round end
+  ---- Access Frequency Analysis ----
+  target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Frequency of   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Frequency of   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Frequency of   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Frequency of   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Frequency of   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Frequency of   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Frequency of   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+Frequency of   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Frequency of   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+  ---- Optimization Preparation ----
+Rank 9 for   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Rank 8 for   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Rank 7 for   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Rank 6 for   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+Rank 5 for   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Rank 4 for   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Rank 3 for   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Rank 2 for   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Rank 1 for   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+  ---- Data Mapping Optimization ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+@.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465]
+  arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 1.600380e+02, 1.280304e+03 after adjustment;		    scaled local reuse is 0x500
+    reuse distance is 0x01
+  arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.600380e+02, 2.560608e+03 after adjustment;		    scaled local reuse is 0xa00
+    reuse distance is 0x01
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+@.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x01
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+@.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+    size is   %90 = sub i64 %87, %89, !dbg !11386
+    global reuse is 0x05
+    local reuse is 1.015380e+01, 8.123040e+01 after adjustment;		    scaled local reuse is 0x051
+    reuse distance is 0x09
+  arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+    size is   %104 = sub i64 %101, %103, !dbg !11404
+    global reuse is 0x08
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+    size is   %118 = sub i64 %115, %117, !dbg !11430
+    global reuse is 0x02
+    local reuse is 8.757690e+01, 1.401230e+03 after adjustment;		    scaled local reuse is 0x579
+    reuse distance is 0x09
+  arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x03
+    local reuse is 4.569230e+01, 3.655384e+02 after adjustment;		    scaled local reuse is 0x16d
+    reuse distance is 0x09
+  arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x07
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x09
+    local reuse is 3.807690e+00, 3.046152e+01 after adjustment;		    scaled local reuse is 0x01e
+    reuse distance is 0x09
+  arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.078710e+02, 1.725936e+03 after adjustment;		    scaled local reuse is 0x6bd
+    reuse distance is 0x01
+  arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 2.538460e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+@.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 2.030760e+01, 3.249216e+02 after adjustment;		    scaled local reuse is 0x144
+    reuse distance is 0x07
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x07
+mpicxx main.o -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES  -o miniVite
diff --git a/miniVite/logcmplll b/miniVite/logcmplll
new file mode 100644
index 0000000..2c49c24
--- /dev/null
+++ b/miniVite/logcmplll
@@ -0,0 +1,5651 @@
+mpicxx -std=c++11 -g -O3 -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -DOMP_GPU -DOMP_GPU_ALLOC -DCHECK_NUM_EDGES  -Xclang -load -Xclang ~/git/unifiedmem/code/llvm-pass/build/uvm/libOMPPass.so -emit-llvm -S -c -o main.ll main.cpp
+In file included from main.cpp:58:
+In file included from ./dspl_gpu_kernel.hpp:58:
+In file included from ./graph.hpp:56:
+./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value]
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^
+./utils.hpp:263:56: note: use function 'std::abs' instead
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^~~~
+                                                       std::abs
+  ---- Function Argument Access Frequency CG Analysis ----
+On function _Z7is_pwr2i
+Round 0
+Round end
+On function _Z8reseederj
+Round 0
+Round end
+On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_
+Round 0
+  alias entry   %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369
+    alias entry   %19 = bitcast i32** %18 to i64*, !dbg !10369
+    alias entry   %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376
+Round 1
+Round end
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+  Frequency of %"class.std::seed_seq"* %0
+  load: 1.254902e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z4lockv
+Round 0
+Round end
+On function _Z6unlockv
+Round 0
+Round end
+On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __clang_call_terminate
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined.
+Round 0
+  alias entry   %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %"class.std::vector.10"* %4
+    load (6.350000e+00) from %"class.std::vector.15"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 1.270000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %5
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t
+Round 0
+  alias entry   %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283
+    alias entry   %10 = bitcast double** %9 to i64*, !dbg !10283
+    alias entry   %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..2
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+    alias entry   %102 = bitcast double* %3 to i64*, !dbg !10325
+Round 1
+Round end
+    load (3.157895e-01) from %"class.std::vector.10"* %4
+    load (2.105263e-01) from double* %3
+    store (2.105263e-01) to double* %3
+    load (2.105263e-01) from double* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z12distInitCommRSt6vectorIlSaIlEES2_l
+Round 0
+  alias entry   %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273
+    alias entry   %7 = bitcast i64** %6 to i64*, !dbg !10273
+    alias entry   %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+  Frequency of %"class.std::vector.0"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..4
+Round 0
+  alias entry   %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi
+Round 0
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485
+  alias entry   %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514
+  alias entry   %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532
+    alias entry   %25 = bitcast double** %24 to i64*, !dbg !10532
+    alias entry   %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536
+  alias entry   %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572
+    alias entry   %41 = bitcast i64** %40 to i64*, !dbg !10572
+    alias entry   %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574
+  alias entry   %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600
+    alias entry   %57 = bitcast i64** %56 to i64*, !dbg !10600
+    alias entry   %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601
+  alias entry   %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622
+    alias entry   %73 = bitcast double** %72 to i64*, !dbg !10622
+    alias entry   %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654
+    alias entry   %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654
+    alias entry   %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685
+    alias entry   %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685
+    alias entry   %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+Warning: wrong traversal order, or recursive call
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352
+  alias entry   %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !10330
+  alias entry   %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !10333
+  alias entry   %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !10340
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !10330
+  alias entry   %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !10333
+  alias entry   %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !10340
+Round 1
+Round end
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.109375e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 4.043750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+  alias entry   %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 0, !dbg !10308
+  alias entry   %22 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %20, i32 1, !dbg !10310
+  alias entry   %31 = getelementptr inbounds i64, i64* %7, i64 %30, !dbg !10326
+  alias entry   %39 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %37, i32 0, !dbg !10337
+  alias entry   %48 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, !dbg !10348
+  alias entry   %58 = getelementptr inbounds double, double* %4, i64 %52, !dbg !10358
+  alias entry   %64 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10364
+  alias entry   %65 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 1, !dbg !10367
+    alias entry   %71 = bitcast double* %22 to i64*, !dbg !10375
+  alias entry   %74 = getelementptr inbounds double, double* %4, i64 %73, !dbg !10377
+    alias entry   %75 = bitcast double* %74 to i64*, !dbg !10378
+Round 1
+Round end
+    load (1.593750e+01) from %struct.Edge* %6
+    load (7.937500e+00) from %struct.Edge* %6
+    load (1.593750e+01) from i64* %7
+    load (1.593750e+01) from i32* %3
+    load (1.625000e+02) from %struct.clmap_t* %2
+    load (9.937500e+00) from i32* %5
+    load (4.937500e+00) from %struct.Edge* %6
+    load (4.937500e+00) from double* %4
+    store (4.937500e+00) to double* %4
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to i32* %3
+    load (1.093750e+01) from i32* %5
+    load (5.437500e+00) from %struct.Edge* %6
+    store (5.437500e+00) to double* %4
+    store (5.437500e+00) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 1.625000e+02		  store: 1.087500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.593750e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 4.937500e+00		  store: 1.037500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 2.087500e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 3.425000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 1.593750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316
+  alias entry   %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322
+  alias entry   %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329
+  alias entry   %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337
+  alias entry   %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341
+  alias entry   %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401
+  alias entry   %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403
+  alias entry   %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414
+    alias entry   %95 = bitcast double* %48 to i64*, !dbg !10457
+  alias entry   %118 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470
+  alias entry   %122 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 1, !dbg !10533
+  alias entry   %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %139, i32 0, !dbg !10534
+  alias entry   %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 1, !dbg !10533
+  alias entry   %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %187, i32 0, !dbg !10534
+  alias entry   %236 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 1, !dbg !10572
+    alias entry   %237 = bitcast double* %236 to i64*, !dbg !10573
+  alias entry   %248 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %235, i32 0, !dbg !10575
+  alias entry   %250 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578
+    alias entry   %252 = bitcast double* %250 to i64*, !dbg !10581
+  alias entry   %263 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583
+  alias entry   %267 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587
+  alias entry   %270 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 1, !dbg !10533
+  alias entry   %272 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %269, i32 0, !dbg !10534
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (7.992188e+00) from %struct.Edge* %3
+    load (3.992188e+00) from %struct.Edge* %3
+    load (7.992188e+00) from i64* %4
+    load (2.492188e+00) from %struct.Edge* %3
+    load (2.742188e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.250000e-01) from %struct.Comm* %7
+    load (1.250000e-01) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.721875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 8.992188e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.121875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21distComputeModularityRK5GraphP4CommPKddi
+Round 0
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304
+    base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+    base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+    base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+    base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Round 1
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 2
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Round 4
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.7
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..8
+Round 0
+  alias entry   %40 = getelementptr inbounds double, double* %6, i64 %39, !dbg !10318
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %39, i32 1, !dbg !10321
+    alias entry   %63 = bitcast double* %5 to i64*, !dbg !10329
+    alias entry   %75 = bitcast double* %7 to i64*, !dbg !10329
+Round 1
+Round end
+    load (1.010526e+01) from double* %6
+    load (1.010526e+01) from %struct.Comm* %8
+    load (2.105263e-01) from double* %5
+    store (2.105263e-01) to double* %5
+    load (2.105263e-01) from double* %7
+    store (2.105263e-01) to double* %7
+    load (2.105263e-01) from double* %5
+    load (2.105263e-01) from double* %7
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %7
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.9
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..10
+Round 0
+    alias entry   %67 = bitcast double* %3 to i64*, !dbg !10310
+    alias entry   %79 = bitcast double* %5 to i64*, !dbg !10310
+Round 1
+Round end
+    load (2.916667e-01) from double* %3
+    store (2.916667e-01) to double* %3
+    load (2.916667e-01) from double* %5
+    store (2.916667e-01) to double* %5
+    load (3.333333e-01) from double* %3
+    load (3.333333e-01) from double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z20distUpdateLocalCinfolP4CommPKS_
+Round 0
+    base alias entry   %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269
+    base alias entry   %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269
+    base alias entry   %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..13
+Round 0
+  alias entry   %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304
+  alias entry   %36 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %35, i32 1, !dbg !10304
+  alias entry   %37 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304
+  alias entry   %38 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %35, i32 1, !dbg !10304
+  alias entry   %39 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304
+  alias entry   %41 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %40, !dbg !10304
+  alias entry   %42 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304
+  alias entry   %43 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %40, !dbg !10304
+    alias entry   %44 = bitcast double* %38 to %struct.Comm*, !dbg !10304
+    alias entry   %46 = bitcast double* %36 to %struct.Comm*, !dbg !10304
+    alias entry   %49 = bitcast %struct.Comm* %43 to double*, !dbg !10304
+    alias entry   %51 = bitcast %struct.Comm* %41 to double*, !dbg !10304
+  alias entry   %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304
+  alias entry   %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304
+  alias entry   %69 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %63, i32 0, !dbg !10304
+  alias entry   %70 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %64, i32 0, !dbg !10304
+  alias entry   %71 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %65, i32 0, !dbg !10304
+  alias entry   %72 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %66, i32 0, !dbg !10304
+    alias entry   %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304
+    alias entry   %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304
+    alias entry   %75 = bitcast i64* %69 to <4 x i64>*, !dbg !10304
+    alias entry   %76 = bitcast i64* %70 to <4 x i64>*, !dbg !10304
+    alias entry   %77 = bitcast i64* %71 to <4 x i64>*, !dbg !10304
+    alias entry   %78 = bitcast i64* %72 to <4 x i64>*, !dbg !10304
+  alias entry   %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307
+  alias entry   %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307
+  alias entry   %99 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 0, !dbg !10307
+  alias entry   %100 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 0, !dbg !10307
+  alias entry   %101 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 0, !dbg !10307
+  alias entry   %102 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 0, !dbg !10307
+    alias entry   %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307
+    alias entry   %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307
+    alias entry   %105 = bitcast i64* %99 to <4 x i64>*, !dbg !10307
+    alias entry   %106 = bitcast i64* %100 to <4 x i64>*, !dbg !10307
+    alias entry   %107 = bitcast i64* %101 to <4 x i64>*, !dbg !10307
+    alias entry   %108 = bitcast i64* %102 to <4 x i64>*, !dbg !10307
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309
+  alias entry   %141 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %63, i32 1, !dbg !10309
+  alias entry   %142 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %64, i32 1, !dbg !10309
+  alias entry   %143 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %65, i32 1, !dbg !10309
+  alias entry   %144 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %66, i32 1, !dbg !10309
+  alias entry   %151 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309
+    alias entry   %152 = bitcast double* %151 to <4 x double>*, !dbg !10309
+  alias entry   %153 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309
+    alias entry   %154 = bitcast double* %153 to <4 x double>*, !dbg !10309
+  alias entry   %155 = getelementptr inbounds double, double* %141, i64 -1, !dbg !10309
+    alias entry   %156 = bitcast double* %155 to <4 x double>*, !dbg !10309
+  alias entry   %157 = getelementptr inbounds double, double* %142, i64 -1, !dbg !10309
+    alias entry   %158 = bitcast double* %157 to <4 x double>*, !dbg !10309
+  alias entry   %159 = getelementptr inbounds double, double* %143, i64 -1, !dbg !10309
+    alias entry   %160 = bitcast double* %159 to <4 x double>*, !dbg !10309
+  alias entry   %161 = getelementptr inbounds double, double* %144, i64 -1, !dbg !10309
+    alias entry   %162 = bitcast double* %161 to <4 x double>*, !dbg !10309
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 0, !dbg !10304
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 0, !dbg !10307
+  alias entry   %188 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %182, i32 1, !dbg !10318
+  alias entry   %190 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %182, i32 1, !dbg !10309
+Round 1
+Round end
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 3.317647e+01		  store: 3.317647e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 3.317647e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..14
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z16distCleanCWandCUlPdP4Comm
+Round 0
+    base alias entry   %17 = bitcast i8** %16 to double**, !dbg !10269
+    base alias entry   %19 = bitcast i8** %18 to double**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+    base alias entry   %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..18
+Round 0
+  alias entry   %30 = getelementptr inbounds double, double* %5, i64 %29, !dbg !10304
+  alias entry   %31 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %29, i32 0, !dbg !10309
+    alias entry   %34 = bitcast i64* %31 to i8*, !dbg !10299
+Round 1
+Round end
+    store (1.058333e+01) to double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..19
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_
+Round 0
+  alias entry   %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433
+  alias entry   %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449
+  alias entry   %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %301 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792
+    alias entry   %302 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to %"struct.std::__detail::_Hash_node"**, !dbg !11793
+    alias entry   %312 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836
+  alias entry   %314 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842
+    alias entry   %317 = bitcast %"struct.std::__detail::_Hash_node_base"* %301 to i8*, !dbg !11846
+  alias entry   %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0
+  alias entry   %321 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0
+  alias entry   %322 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0
+  alias entry   %323 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %324 = bitcast %"class.std::vector.0"* %323 to i64*
+  alias entry   %325 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %326 = bitcast i64** %325 to i64*
+  alias entry   %330 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0
+  alias entry   %331 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %332 = bitcast %"class.std::vector.0"* %331 to i64*
+  alias entry   %333 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %334 = bitcast i64** %333 to i64*
+  alias entry   %818 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393
+  alias entry   %819 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405
+    alias entry   %820 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to %"struct.std::_Rb_tree_node"**, !dbg !13405
+  alias entry   %826 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419
+  alias entry   %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+    base alias entry   %827 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+  alias entry   %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+    base alias entry   %828 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+  alias entry   %829 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437
+  alias entry   %830 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442
+  alias entry   %831 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447
+    alias entry   %832 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to %"struct.std::_Rb_tree_node"**, !dbg !13447
+  alias entry   %838 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452
+  alias entry   %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+    base alias entry   %839 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+  alias entry   %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+    base alias entry   %840 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+  alias entry   %841 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464
+    alias entry   %846 = bitcast %"struct.std::_Rb_tree_node_base"** %819 to i64*
+    alias entry   %848 = bitcast %"struct.std::_Rb_tree_node_base"* %826 to %"struct.std::_Rb_tree_node"*
+    alias entry   %850 = bitcast %"struct.std::_Rb_tree_node_base"** %831 to i64*
+    alias entry   %852 = bitcast %"struct.std::_Rb_tree_node_base"* %838 to %"struct.std::_Rb_tree_node"*
+    alias entry   %967 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %827, align 8, !dbg !14017, !tbaa !14018
+    alias entry   %1023 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %839, align 8, !dbg !14306, !tbaa !14018
+Round 1
+Round end
+    load (1.000000e+00) from i64* %4
+    load (9.999994e-01) from i64* %3
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999803e+00) from %"class.std::vector.0"* %6
+    load (1.999960e+01) from %"class.std::vector.0"* %6
+    load (6.249782e+00) from %"class.std::vector.0"* %5
+    load (1.249956e+01) from %"class.std::vector.0"* %5
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (1.999809e+01) from %"class.std::vector.0"* %8
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..22
+Round 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %4
+    load (3.200000e-01) from %"class.std::vector.0"* %6
+    load (1.020000e+01) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 1.020000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.24
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..25
+Round 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.29"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.27
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..28
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..30
+Round 0
+  alias entry   %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503
+  alias entry   %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %2
+    load (2.047500e+02) from %"class.std::vector.0"* %4
+    load (2.047500e+02) from %"class.std::vector.15"* %7
+    load (2.047500e+02) from %"class.std::vector.52"* %6
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %6
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %7
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z22createCommunityMPITypev
+Round 0
+Round end
+On function _Z23destroyCommunityMPITypev
+Round 0
+Round end
+On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii
+Round 0
+  alias entry   %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869
+  alias entry   %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050
+  alias entry   %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068
+    alias entry   %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068
+  alias entry   %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085
+  alias entry   %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %56 = bitcast %"class.std::vector.0"* %55 to i64*
+  alias entry   %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %58 = bitcast i64** %57 to i64*
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999994e-01) from %class.Graph* %0
+    load (9.999994e-01) from %"class.std::map"* %2
+    load (1.999985e+01) from %class.Graph* %0
+    load (1.999985e+01) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 4.199970e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::map"* %2
+  load: 9.999994e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..32
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.137255e-01) from %"class.std::vector.66"* %4
+    load (3.137255e-01) from %"class.std::vector.0"* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.66"* %4
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.34
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to i64**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..35
+Round 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %6
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..37
+Round 0
+  alias entry   %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (6.350000e+00) from %"class.std::vector.52"* %3
+    load (6.350000e+00) from %"class.std::vector.15"* %4
+    load (6.350000e+00) from i64* %5
+    load (2.047500e+02) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %3
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.111000e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii
+Round 0
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319
+  alias entry   %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485
+    alias entry   %52 = bitcast i64** %51 to i64*, !dbg !10485
+    alias entry   %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489
+  alias entry   %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517
+    alias entry   %72 = bitcast i64** %71 to i64*, !dbg !10517
+    alias entry   %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518
+    alias entry   %91 = bitcast %"class.std::vector.0"* %3 to i8**
+  alias entry   %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %99 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10612
+    alias entry   %100 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10612
+  alias entry   %129 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673
+    alias entry   %130 = bitcast i64** %129 to i64*, !dbg !10673
+    alias entry   %132 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674
+  alias entry   %148 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696
+    alias entry   %149 = bitcast i64** %148 to i64*, !dbg !10696
+    alias entry   %151 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697
+  alias entry   %191 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %251 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %310 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244
+  alias entry   %311 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245
+    alias entry   %312 = bitcast i64** %310 to i64*, !dbg !11249
+    alias entry   %314 = bitcast i64** %311 to i64*, !dbg !11250
+  alias entry   %320 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279
+  alias entry   %321 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280
+    alias entry   %322 = bitcast i64** %320 to i64*, !dbg !11284
+    alias entry   %324 = bitcast i64** %321 to i64*, !dbg !11285
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..39
+Round 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6
+    alias entry   %29 = bitcast %"class.std::vector.0"* %28 to i64*
+  alias entry   %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %31 = bitcast i64** %30 to i64*
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.988141e+02) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (1.590478e+03) from %"class.std::vector.29"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 9.741684e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %5
+  load: 1.590478e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.41
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..42
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi
+Round 0
+  alias entry   %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180
+  alias entry   %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380
+    alias entry   %86 = bitcast i64** %85 to i64*, !dbg !11380
+    alias entry   %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384
+    alias entry   %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392
+  alias entry   %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399
+  alias entry   %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402
+    alias entry   %100 = bitcast i64** %99 to i64*, !dbg !11402
+    alias entry   %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403
+    alias entry   %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410
+  alias entry   %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417
+  alias entry   %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424
+    alias entry   %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424
+    alias entry   %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428
+    alias entry   %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440
+Round 1
+Round end
+    load (9.999981e-01) from %class.Graph* %2
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..45
+Round 0
+Round end
+    call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6
+    call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7
+    call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8
+    call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9
+    call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10
+    call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11
+    call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12
+    call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.116667e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 1.058333e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %7
+  load: 1.822318e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %8
+  load: 9.516732e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %9
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 7.937500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %11
+  load: 2.245651e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %12
+  load: 5.291667e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %14
+  load: 5.291667e+00		  store: 5.291667e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..46
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %5
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %8
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %9
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %10
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %12
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..49
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from i64** %4
+    load (3.200000e-01) from i64** %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function main
+Round 0
+    base alias entry   %14 = alloca i8**, align 8
+    alias entry   %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335
+Round 1
+Round end
+  Frequency of i8** %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGGC2ElP19ompi_communicator_t
+Round 0
+  alias entry   %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266
+  alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+    base alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+  alias entry   %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279
+    alias entry   %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278
+  alias entry   %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282
+  alias entry   %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284
+  alias entry   %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287
+  alias entry   %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320
+    alias entry   %101 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278
+    alias entry   %172 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278
+  alias entry   %184 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579
+    alias entry   %191 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278
+Round 1
+Round end
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    store (3.437500e-01) to %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (6.250000e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 8.906250e+00		  store: 6.843750e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGG8generateEbbi
+Round 0
+  alias entry   %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306
+  alias entry   %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592
+  alias entry   %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709
+  alias entry   %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828
+  alias entry   %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832
+  alias entry   %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836
+  alias entry   %430 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915
+  alias entry   %819 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101
+  alias entry   %895 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1233 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1536 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+Round 1
+Round end
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (6.249994e-01) from %class.GenerateRGG* %0
+    load (9.999990e-01) from %class.GenerateRGG* %0
+    load (4.999995e-01) from %class.GenerateRGG* %0
+    load (3.124994e-01) from %class.GenerateRGG* %0
+    load (9.999985e-01) from %class.GenerateRGG* %0
+    load (4.999993e-01) from %class.GenerateRGG* %0
+    load (3.124992e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (9.999923e-01) from %class.GenerateRGG* %0
+    load (9.999914e-01) from %class.GenerateRGG* %0
+    load (3.749968e-01) from %class.GenerateRGG* %0
+    load (3.749964e-01) from %class.GenerateRGG* %0
+    load (9.999890e-01) from %class.GenerateRGG* %0
+    load (9.998746e-01) from %class.GenerateRGG* %0
+    load (3.199362e+02) from %class.GenerateRGG* %0
+    load (3.199361e+02) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998698e-01) from %class.GenerateRGG* %0
+    load (4.999349e-01) from %class.GenerateRGG* %0
+    load (2.499674e-01) from %class.GenerateRGG* %0
+    load (7.997451e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (7.997448e+01) from %class.GenerateRGG* %0
+    load (4.999063e-01) from %class.GenerateRGG* %0
+    load (2.499531e-01) from %class.GenerateRGG* %0
+    load (7.996993e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (7.996991e+01) from %class.GenerateRGG* %0
+    load (9.998126e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998072e-01) from %class.GenerateRGG* %0
+    load (9.998015e-01) from %class.GenerateRGG* %0
+    load (6.248724e-01) from %class.GenerateRGG* %0
+    load (6.248718e-01) from %class.GenerateRGG* %0
+    load (1.952724e-01) from %class.GenerateRGG* %0
+    load (3.905445e-01) from %class.GenerateRGG* %0
+    load (3.905442e-01) from %class.GenerateRGG* %0
+    load (6.248393e-01) from %class.GenerateRGG* %0
+    load (1.249644e+01) from %class.GenerateRGG* %0
+    load (1.249643e+01) from %class.GenerateRGG* %0
+    load (1.171538e+00) from %class.GenerateRGG* %0
+    load (5.857690e-01) from %class.GenerateRGG* %0
+    load (2.928845e-01) from %class.GenerateRGG* %0
+    load (1.464422e-01) from %class.GenerateRGG* %0
+    load (6.248387e-01) from %class.GenerateRGG* %0
+    load (6.248381e-01) from %class.GenerateRGG* %0
+    load (1.249638e+01) from %class.GenerateRGG* %0
+    load (6.248253e-01) from %class.GenerateRGG* %0
+    load (3.905154e-01) from %class.GenerateRGG* %0
+    load (2.440719e-01) from %class.GenerateRGG* %0
+    load (6.248247e-01) from %class.GenerateRGG* %0
+    load (4.881438e+00) from %class.GenerateRGG* %0
+    load (9.997431e-01) from %class.GenerateRGG* %0
+    load (9.997421e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (6.248378e-01) from %class.GenerateRGG* %0
+    load (1.999481e+01) from %class.GenerateRGG* %0
+    load (9.997388e-01) from %class.GenerateRGG* %0
+    load (9.997385e-01) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 1.248245e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN14BinaryEdgeList4readEiiiSs
+Round 0
+  alias entry   %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380
+  alias entry   %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388
+  alias entry   %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514
+    alias entry   %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515
+  alias entry   %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518
+    alias entry   %105 = bitcast i64* %104 to i8*, !dbg !10519
+  alias entry   %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532
+  alias entry   %183 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605
+Round 1
+Round end
+    load (9.999971e-01) from %class.BinaryEdgeList* %0
+    load (9.999971e-01) from %"class.std::basic_string"* %4
+    load (6.249948e-01) from %class.BinaryEdgeList* %0
+    load (9.999905e-01) from %class.BinaryEdgeList* %0
+    store (9.999905e-01) to %class.BinaryEdgeList* %0
+    load (9.999895e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999729e-01) from %class.BinaryEdgeList* %0
+    store (9.999729e-01) to %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999547e-01) from %class.BinaryEdgeList* %0
+    load (1.999909e+01) from %class.BinaryEdgeList* %0
+  Frequency of %class.BinaryEdgeList* %0
+  load: 2.962391e+01		  store: 1.999963e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::basic_string"* %4
+  load: 9.999971e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E
+Round 0
+Round end
+Warning: wrong traversal order, or recursive call
+On function _ZN5GraphC2EllllP19ompi_communicator_t
+Round 0
+  alias entry   %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272
+  alias entry   %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272
+  alias entry   %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309
+    alias entry   %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309
+  alias entry   %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320
+  alias entry   %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272
+    alias entry   %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+    base alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+  alias entry   %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336
+    alias entry   %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361
+    alias entry   %29 = bitcast i64** %28 to i64*, !dbg !10361
+    alias entry   %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365
+  alias entry   %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416
+    alias entry   %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416
+    alias entry   %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420
+  alias entry   %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455
+    alias entry   %65 = bitcast i64** %64 to i64*, !dbg !10455
+    alias entry   %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0
+  alias entry   %111 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511
+  alias entry   %117 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547
+  alias entry   %123 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576
+Round 1
+Round end
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    load (9.999990e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+Warning: wrong traversal order, or recursive call
+On function _ZN3LCGC2EjPdlP19ompi_communicator_t
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268
+  alias entry   %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277
+  alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+    base alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+  alias entry   %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281
+    alias entry   %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300
+  alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+    base alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+  alias entry   %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306
+    alias entry   %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305
+  alias entry   %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309
+  alias entry   %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326
+    alias entry   %21 = bitcast i64** %20 to i64*, !dbg !10326
+    alias entry   %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330
+  alias entry   %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359
+  alias entry   %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374
+  alias entry   %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399
+    alias entry   %53 = bitcast i64* %52 to i8*, !dbg !10400
+    alias entry   %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305
+Round 1
+Round end
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    load (9.999989e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+Warning: wrong traversal order, or recursive call
+On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 1, !dbg !10267
+  alias entry   %8 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 0, !dbg !10279
+  alias entry   %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376
+Round 1
+Round end
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1
+    store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1
+Warning: wrong traversal order, or recursive call
+On function _ZNSt6vectorIlSaIlEEaSERKS1_
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10278
+    alias entry   %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285
+  alias entry   %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294
+    alias entry   %13 = bitcast i64** %12 to i64*, !dbg !10294
+    alias entry   %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10459
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490
+    alias entry   %42 = bitcast i64** %41 to i64*, !dbg !10490
+  alias entry   %53 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573
+  alias entry   %73 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    store (6.250000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.695312e+00		  store: 1.250000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 1.445312e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl
+Round 0
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281
+    alias entry   %9 = bitcast i64** %8 to i64*, !dbg !10281
+  alias entry   %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288
+    alias entry   %12 = bitcast i64** %11 to i64*, !dbg !10288
+    alias entry   %632 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728
+  alias entry   %848 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.382812e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 6.250000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %87 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375
+  alias entry   %108 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %115 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+  Frequency of %"class.std::vector.5"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN3LCG18parallel_prefix_opEv
+Round 0
+  alias entry   %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283
+  alias entry   %169 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10361
+  alias entry   %175 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269
+  alias entry   %179 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0
+  alias entry   %188 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10372
+  alias entry   %252 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10372
+Round 1
+Round end
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+  Frequency of %class.LCG* %0
+  load: 8.523529e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %65 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369
+  alias entry   %86 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %93 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag
+Round 0
+Round end
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag
+Round 0
+  alias entry   %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344
+    alias entry   %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344
+  alias entry   %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351
+    alias entry   %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351
+    alias entry   %120 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799
+  alias entry   %141 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %146 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.675781e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_
+Round 0
+Round end
+On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_
+Round 0
+Round end
+On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_
+Round 0
+Round end
+On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_
+Round 0
+Round end
+On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_
+Round 0
+Round end
+On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast i64** %8 to i64*, !dbg !10380
+    alias entry   %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast double** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast double** %8 to i64*, !dbg !10381
+    alias entry   %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460
+    alias entry   %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467
+    alias entry   %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+  Frequency of %"class.std::vector.15"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_
+Round 0
+Round end
+  Frequency of %"class.std::unordered_set"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530
+    alias entry   %77 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550
+    alias entry   %83 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618
+  alias entry   %87 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627
+    alias entry   %95 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628
+    base alias entry   %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10630
+  alias entry   %99 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639
+    alias entry   %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10640
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641
+  alias entry   %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641
+    alias entry   %104 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642
+  alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645
+    base alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10645
+    base alias entry   %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10676
+    base alias entry   %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10678
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525
+  base alias offset entry (0)   %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10629, !tbaa !10511
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable"* %0
+  Frequency of %"class.std::_Hashtable"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490
+    alias entry   %77 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510
+    alias entry   %83 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578
+  alias entry   %87 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %94 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587
+    alias entry   %95 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588
+    base alias entry   %97 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %96, i64 0, i32 0, !dbg !10590
+  alias entry   %99 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599
+    alias entry   %100 = bitcast %"struct.std::__detail::_Hash_node_base"* %99 to i64*, !dbg !10600
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601
+  alias entry   %103 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601
+    alias entry   %104 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602
+  alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605
+    base alias entry   %105 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %99, i64 0, i32 0, !dbg !10605
+    base alias entry   %114 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %85, i64 %113, !dbg !10630
+    base alias entry   %118 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %117, i64 %86, !dbg !10632
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485
+  base alias offset entry (0)   %96 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %88, align 8, !dbg !10589, !tbaa !10471
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable.34"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable.34"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable.34"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable.34"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable.34"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable.34"* %0
+  Frequency of %"class.std::_Hashtable.34"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node.61"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273
+  alias entry   %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %59 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394
+    alias entry   %60 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395
+  alias entry   %81 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %89 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449
+    alias entry   %143 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+  Frequency of %"class.std::vector.52"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _GLOBAL__sub_I_main.cpp
+Round 0
+Round end
+On function .omp_offloading.descriptor_unreg
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda
+Round 0
+Round end
+  ---- Identify Target Regions ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+  ---- Target Distance Calculation ----
+_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations
+target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) 
+target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) 
+target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) 
+  ---- OMP (main.cpp, powerpc64le-unknown-linux-gnu) ----
+new entry   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+new entry   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+new entry   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+new entry   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+new entry   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+new entry   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+new entry   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+new entry   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+new entry   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Round 0
+  base alias entry   %130 = bitcast i64** %29 to i8**, !dbg !11450
+  base alias entry   %142 = bitcast i64** %30 to i8**, !dbg !11479
+  alias entry   %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487
+  alias entry   %158 = bitcast i8* %156 to double*, !dbg !11511
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias entry   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias entry   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias entry   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias entry   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias entry   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias entry   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias entry   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias entry   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias entry   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias entry   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias entry   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias entry   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias entry   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias entry   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias entry   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry (func arg) %struct.Comm* %1
+  alias entry (func arg) double* %2
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1
+Round 1
+  base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+  base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+  base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+  base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %29 = alloca i64*, align 8
+  base alias entry   %30 = alloca i64*, align 8
+  base alias offset entry (1)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias offset entry (2)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+  base alias offset entry (1)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (1)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (2)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (-1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-4)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-4)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (6)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-3)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (6)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (7)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-5)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-4)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (7)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-5)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (8)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-6)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-5)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-1)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (8)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-6)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-1)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-8)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-7)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-6)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-3)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-2)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-8)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-7)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-3)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-2)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (10)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-8)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-7)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-4)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-3)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-2)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (10)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-8)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-4)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-3)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-2)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-10)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-9)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-8)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-5)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-4)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-3)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-1)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-10)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-9)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-8)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-5)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-4)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-3)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-1)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry   %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %274 = ptrtoint i64* %263 to i64, !dbg !11584
+  alias entry   %275 = ptrtoint i64* %264 to i64, !dbg !11584
+  base alias entry   %215 = bitcast i8** %214 to i64*
+  base alias entry   %217 = bitcast i8** %216 to i64*
+  base alias entry   %220 = bitcast i8** %219 to i64*
+  base alias entry   %222 = bitcast i8** %221 to i64*
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2
+Warning: reach to function declaration __kmpc_fork_call
+Round 2
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %126 = bitcast i64** %29 to i8*, !dbg !11447
+  base alias entry   %139 = bitcast i64** %30 to i8*, !dbg !11477
+  base alias offset entry (1)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (2)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (1)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (2)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (1)   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias offset entry (1)   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias offset entry (1)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (2)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (3)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (6)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (7)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (8)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (10)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (1)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (2)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (6)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (7)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (8)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (10)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (1)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (2)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (5)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (6)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (7)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (9)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (1)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (5)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (6)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (7)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (9)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (4)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (5)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (6)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (8)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (4)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (5)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (6)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (8)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (4)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (5)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (7)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (4)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (5)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (7)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (4)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (6)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias entry   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (4)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (6)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias entry   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias entry   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (5)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias entry   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3
+Warning: reach to function declaration __kmpc_fork_call
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (4)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (5)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-2)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-2)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-3)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-4)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-3)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-4)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-5)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-4)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-5)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-6)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-5)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-6)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-7)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-6)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4
+Warning: reach to function declaration __kmpc_fork_call
+Round 4
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (5)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (4)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (5)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (4)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (3)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (4)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (3)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5
+Warning: reach to function declaration __kmpc_fork_call
+Round 5
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6
+Warning: reach to function declaration __kmpc_fork_call
+Round 6
+Warning: reach to function declaration __kmpc_fork_teams
+Round end
+  ---- Access Frequency Analysis ----
+  target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Frequency of   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Frequency of   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Frequency of   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Frequency of   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Frequency of   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Frequency of   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Frequency of   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+Frequency of   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Frequency of   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+  ---- Optimization Preparation ----
+Rank 9 for   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Rank 8 for   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Rank 7 for   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Rank 6 for   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+Rank 5 for   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Rank 4 for   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Rank 3 for   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Rank 2 for   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Rank 1 for   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+  ---- Data Mapping Optimization ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+@.offload_maptypes = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 33, i64 547, i64 33]
+  arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 1.600380e+02, 1.280304e+03 after adjustment;		    scaled local reuse is 0x500
+    reuse distance is 0x01
+  arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.600380e+02, 2.560608e+03 after adjustment;		    scaled local reuse is 0xa00
+    reuse distance is 0x01
+    map type changed: @.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465]
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x01
+    map type changed: @.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506]
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+@.offload_maptypes.47 = private unnamed_addr constant [12 x i64] [i64 800, i64 33, i64 33, i64 33, i64 33, i64 34, i64 33, i64 33, i64 35, i64 800, i64 35, i64 800]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+    size is   %90 = sub i64 %87, %89, !dbg !11386
+    global reuse is 0x05
+    local reuse is 1.015380e+01, 8.123040e+01 after adjustment;		    scaled local reuse is 0x051
+    reuse distance is 0x09
+  arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+    size is   %104 = sub i64 %101, %103, !dbg !11404
+    global reuse is 0x08
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+    size is   %118 = sub i64 %115, %117, !dbg !11430
+    global reuse is 0x02
+    local reuse is 8.757690e+01, 1.401230e+03 after adjustment;		    scaled local reuse is 0x579
+    reuse distance is 0x09
+  arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x03
+    local reuse is 4.569230e+01, 3.655384e+02 after adjustment;		    scaled local reuse is 0x16d
+    reuse distance is 0x09
+  arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x07
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x09
+    local reuse is 3.807690e+00, 3.046152e+01 after adjustment;		    scaled local reuse is 0x01e
+    reuse distance is 0x09
+  arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.078710e+02, 1.725936e+03 after adjustment;		    scaled local reuse is 0x6bd
+    reuse distance is 0x01
+  arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 2.538460e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+    map type changed: @.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800]
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 2.030760e+01, 3.249216e+02 after adjustment;		    scaled local reuse is 0x144
+    reuse distance is 0x07
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x07
+    map type changed: @.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161]
+1 warning generated.
+In file included from main.cpp:58:
+In file included from ./dspl_gpu_kernel.hpp:58:
+In file included from ./graph.hpp:56:
+./utils.hpp:263:56: warning: using floating point absolute value function 'fabs' when argument is of integer type [-Wabsolute-value]
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^
+./utils.hpp:263:56: note: use function 'std::abs' instead
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+                                                       ^~~~
+                                                       std::abs
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+change loop scale from 32.0 to 1.0
+  ---- Function Argument Access Frequency CG Analysis ----
+On function __omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396
+Round 0
+  alias entry   %71 = getelementptr inbounds double, double* %2, i64 %68, !dbg !45
+  alias entry   %74 = getelementptr inbounds %struct.Comm, %struct.Comm* %4, i64 %68, i32 1, !dbg !52
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.600385e+02) from double* %2
+    load (1.600385e+02) from %struct.Comm* %4
+    load (6.227106e-02) from double* %1
+    store (6.227106e-02) to double* %1
+    load (6.227106e-02) from double* %3
+    store (6.227106e-02) to double* %3
+  Frequency of double* %1
+  load: 6.227106e-02		  store: 6.227106e-02 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.600385e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.227106e-02		  store: 6.227106e-02 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 1.600385e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436
+Round 0
+  alias entry   %41 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !45
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 0, !dbg !53
+  alias entry   %46 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !55
+  alias entry   %48 = getelementptr inbounds %struct.Comm, %struct.Comm* %1, i64 %40, i32 1, !dbg !57
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (5.076923e+00) from %struct.Comm* %2
+    load (5.076923e+00) from %struct.Comm* %1
+    store (5.076923e+00) to %struct.Comm* %1
+    load (5.076923e+00) from %struct.Comm* %2
+    load (5.076923e+00) from %struct.Comm* %1
+    store (5.076923e+00) to %struct.Comm* %1
+  Frequency of %struct.Comm* %1
+  load: 1.015385e+01		  store: 1.015385e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 1.015385e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455
+Round 0
+  alias entry   %41 = getelementptr inbounds double, double* %1, i64 %40, !dbg !45
+  alias entry   %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 1, !dbg !52
+  alias entry   %43 = getelementptr inbounds %struct.Comm, %struct.Comm* %2, i64 %40, i32 0, !dbg !57
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    store (5.076923e+00) to double* %1
+    store (5.076923e+00) to %struct.Comm* %2
+    store (5.076923e+00) to %struct.Comm* %2
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 5.076923e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 1.015385e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368
+Round 0
+Round end
+change loop scale from 32.0 to 1.0
+Warning: wrong traversal order, or recursive call
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %91 = getelementptr inbounds i64, i64* %2, i64 %90, !dbg !35
+  alias entry   %93 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !38
+  alias entry   %96 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !40
+  alias entry   %99 = getelementptr inbounds i64, i64* %1, i64 %98, !dbg !42
+  alias entry   %103 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 0, !dbg !45
+  alias entry   %105 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %95, i32 1, !dbg !49
+    base alias entry   %178 = select i1 %119, %struct.Edge** %13, %struct.Edge** %177
+  alias entry   %188 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 0, !dbg !69
+  alias entry   %189 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %187, i32 1, !dbg !70
+  alias entry   %198 = getelementptr inbounds i64, i64* %4, i64 %197, !dbg !77
+    alias entry   %239 = bitcast double* %189 to i64*, !dbg !109
+  alias entry   %282 = getelementptr inbounds double, double* %10, i64 %0, !dbg !122
+  alias entry   %286 = getelementptr inbounds double, double* %6, i64 %0, !dbg !125
+  alias entry   %307 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 1, !dbg !136
+  alias entry   %309 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %306, i32 0, !dbg !137
+  alias entry   %355 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 1, !dbg !136
+  alias entry   %357 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %354, i32 0, !dbg !137
+  alias entry   %403 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 1, !dbg !167
+    alias entry   %404 = bitcast double* %403 to i64*, !dbg !168
+  alias entry   %415 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %402, i32 0, !dbg !170
+  alias entry   %417 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 1, !dbg !172
+    alias entry   %419 = bitcast double* %417 to i64*, !dbg !174
+  alias entry   %430 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %95, i32 0, !dbg !176
+  alias entry   %434 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !179
+  alias entry   %462 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 1, !dbg !136
+  alias entry   %464 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %461, i32 0, !dbg !137
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (8.000000e+00) from %struct.Edge* %3
+    load (4.000000e+00) from %struct.Edge* %3
+    load (8.000000e+00) from i64* %4
+    load (2.500000e+00) from %struct.Edge* %3
+    load (2.750000e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.236264e-01) from %struct.Comm* %7
+    load (1.236264e-01) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (5.000000e+00) from %struct.Comm* %7
+    load (5.000000e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.725000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 9.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.124725e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+    base alias entry   %83 = select i1 %16, %struct.Edge** %12, %struct.Edge** %82
+  alias entry   %93 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 0, !dbg !38
+  alias entry   %94 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %92, i32 1, !dbg !39
+  alias entry   %103 = getelementptr inbounds i64, i64* %7, i64 %102, !dbg !48
+  alias entry   %111 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %110, i32 0, !dbg !53
+  alias entry   %121 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, !dbg !61
+  alias entry   %131 = getelementptr inbounds double, double* %4, i64 %125, !dbg !70
+  alias entry   %138 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 1, !dbg !75
+  alias entry   %139 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %120, i32 0, !dbg !76
+  alias entry   %146 = getelementptr inbounds double, double* %4, i64 %145, !dbg !83
+    alias entry   %147 = bitcast double* %146 to i64*, !dbg !84
+    alias entry   %148 = bitcast double* %94 to i64*, !dbg !85
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (5.000000e-01) from %struct.Edge* %6
+    load (2.472527e-01) from %struct.Edge* %6
+    load (5.000000e-01) from i64* %7
+    load (5.000000e-01) from i32* %3
+    load (5.076923e+00) from %struct.clmap_t* %2
+    load (3.076923e-01) from i32* %5
+    load (1.538462e-01) from %struct.Edge* %6
+    load (1.538462e-01) from double* %4
+    store (1.538462e-01) to double* %4
+    store (1.703297e-01) to %struct.clmap_t* %2
+    store (1.703297e-01) to %struct.clmap_t* %2
+    store (1.703297e-01) to i32* %3
+    load (3.406593e-01) from i32* %5
+    load (1.703297e-01) from %struct.Edge* %6
+    store (1.703297e-01) to double* %4
+    store (1.703297e-01) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 5.076923e+00		  store: 3.406593e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 5.000000e-01		  store: 1.703297e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 1.538462e-01		  store: 3.241758e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 6.483516e-01		  store: 1.703297e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 1.071429e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !36
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !43
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !46
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !48
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !52
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !62
+  alias entry   %81 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 1, !dbg !43
+  alias entry   %83 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %80, i32 0, !dbg !46
+  alias entry   %89 = getelementptr inbounds double, double* %2, i64 %86, !dbg !52
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 1, !dbg !43
+  alias entry   %128 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %125, i32 0, !dbg !46
+  alias entry   %134 = getelementptr inbounds double, double* %2, i64 %131, !dbg !52
+Round 1
+Round end
+change loop scale from 32.0 to 1.0
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (3.125000e-01) from %struct.Comm* %5
+    load (3.125000e-01) from %struct.Comm* %5
+    load (1.562500e-01) from double* %2
+    load (3.125000e-01) from %struct.Comm* %5
+    load (3.125000e-01) from %struct.Comm* %5
+    load (1.562500e-01) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.437500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 1.750000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368
+Round 0
+Round end
+change loop scale from 32.0 to 1.0
+    call (5.076923e+00, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %1
+    call (5.076923e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %2
+    call (5.076923e+00, 1.725000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %3
+    call (5.076923e+00, 9.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %4
+    call (5.076923e+00, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (5.076923e+00, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %6
+    call (5.076923e+00, 2.124725e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %7
+    call (5.076923e+00, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %8
+    call (5.076923e+00, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %10
+  Frequency of i64* %1
+  load: 1.015385e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 5.076923e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 8.757692e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 4.569231e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 5.076923e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 3.807692e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 1.078707e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 2.538462e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 2.538462e+00		  store: 2.538462e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  ---- Identify Target Regions ----
+  ---- OMP (main.cpp, nvptx64-nvidia-cuda) ----
+Info: ignore malloc
+Info: ignore malloc
+Info: ignore malloc
+Round 0
+Round end
+  ---- Access Frequency Analysis ----
+  ---- Optimization Preparation ----
+  ---- Data Mapping Optimization ----
+1 warning generated.
+  ---- Function Argument Access Frequency CG Analysis ----
+On function _Z7is_pwr2i
+Round 0
+Round end
+On function _Z8reseederj
+Round 0
+Round end
+On function _ZNSt8seed_seq8generateIN9__gnu_cxx17__normal_iteratorIPjSt6vectorIjSaIjEEEEEEvT_S8_
+Round 0
+  alias entry   %18 = getelementptr inbounds %"class.std::seed_seq", %"class.std::seed_seq"* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10369
+    alias entry   %19 = bitcast i32** %18 to i64*, !dbg !10369
+    alias entry   %21 = bitcast %"class.std::seed_seq"* %0 to i64*, !dbg !10376
+Round 1
+Round end
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+    load (6.274510e-01) from %"class.std::seed_seq"* %0
+  Frequency of %"class.std::seed_seq"* %0
+  load: 1.254902e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z4lockv
+Round 0
+Round end
+On function _Z6unlockv
+Round 0
+Round end
+On function _Z19distSumVertexDegreeRK5GraphRSt6vectorIdSaIdEERS2_I4CommSaIS6_EE
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10459
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function __clang_call_terminate
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined.
+Round 0
+  alias entry   %25 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %class.Graph* %3
+    load (6.350000e+00) from %"class.std::vector.10"* %4
+    load (6.350000e+00) from %"class.std::vector.15"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 1.270000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %5
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z29distCalcConstantForSecondTermRKSt6vectorIdSaIdEEP19ompi_communicator_t
+Round 0
+  alias entry   %9 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10283
+    alias entry   %10 = bitcast double** %9 to i64*, !dbg !10283
+    alias entry   %12 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10288
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..2
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 0
+    alias entry   %98 = bitcast double* %3 to i64*, !dbg !10325
+Round 1
+Round end
+    load (3.157895e-01) from %"class.std::vector.10"* %4
+    load (2.105263e-01) from double* %3
+    store (2.105263e-01) to double* %3
+    load (2.105263e-01) from double* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.10"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z12distInitCommRSt6vectorIlSaIlEES2_l
+Round 0
+  alias entry   %6 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10273
+    alias entry   %7 = bitcast i64** %6 to i64*, !dbg !10273
+    alias entry   %9 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10280
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+    load (1.000000e+00) from %"class.std::vector.0"* %1
+  Frequency of %"class.std::vector.0"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..4
+Round 0
+  alias entry   %29 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z15distInitLouvainRK5GraphRSt6vectorIlSaIlEES5_RS2_IdSaIdEES8_RS2_I4CommSaIS9_EESC_Rdi
+Round 0
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10485
+  alias entry   %20 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10502
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10514
+  alias entry   %24 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10532
+    alias entry   %25 = bitcast double** %24 to i64*, !dbg !10532
+    alias entry   %27 = bitcast %"class.std::vector.10"* %3 to i64*, !dbg !10536
+  alias entry   %40 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10572
+    alias entry   %41 = bitcast i64** %40 to i64*, !dbg !10572
+    alias entry   %43 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10574
+  alias entry   %56 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !10600
+    alias entry   %57 = bitcast i64** %56 to i64*, !dbg !10600
+    alias entry   %59 = bitcast %"class.std::vector.0"* %2 to i64*, !dbg !10601
+  alias entry   %72 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10622
+    alias entry   %73 = bitcast double** %72 to i64*, !dbg !10622
+    alias entry   %75 = bitcast %"class.std::vector.10"* %4 to i64*, !dbg !10623
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10654
+    alias entry   %89 = bitcast %struct.Comm** %88 to i64*, !dbg !10654
+    alias entry   %91 = bitcast %"class.std::vector.15"* %5 to i64*, !dbg !10658
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10685
+    alias entry   %105 = bitcast %struct.Comm** %104 to i64*, !dbg !10685
+    alias entry   %107 = bitcast %"class.std::vector.15"* %6 to i64*, !dbg !10686
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+    load (1.000000e+00) from %"class.std::vector.10"* %3
+Warning: wrong traversal order, or recursive call
+On function _Z15distGetMaxIndexP7clmap_tRiPdS1_dPK4Commdldllld
+Round 0
+  alias entry   %22 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 %21
+  alias entry   %28 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 0, !dbg !10320
+  alias entry   %33 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 1, !dbg !10330
+  alias entry   %35 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %32, i32 0, !dbg !10333
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 0, i32 1, !dbg !10335
+  alias entry   %41 = getelementptr inbounds double, double* %2, i64 %38, !dbg !10340
+  alias entry   %60 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %0, i64 1, !dbg !10352
+  alias entry   %80 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 1, !dbg !10330
+  alias entry   %82 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %79, i32 0, !dbg !10333
+  alias entry   %88 = getelementptr inbounds double, double* %2, i64 %85, !dbg !10340
+  alias entry   %124 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 1, !dbg !10330
+  alias entry   %126 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %123, i32 0, !dbg !10333
+  alias entry   %132 = getelementptr inbounds double, double* %2, i64 %129, !dbg !10340
+Round 1
+Round end
+    load (1.000000e+00) from double* %2
+    load (1.000000e+00) from i32* %3
+    load (1.000000e+00) from i32* %1
+    load (5.000000e-01) from %struct.clmap_t* %0
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.Comm* %5
+    load (2.500000e-01) from %struct.clmap_t* %0
+    load (1.250000e-01) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+    load (9.984375e+00) from %struct.Comm* %5
+    load (9.984375e+00) from %struct.Comm* %5
+    load (4.984375e+00) from double* %2
+  Frequency of %struct.clmap_t* %0
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 1.109375e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 4.043750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z24distBuildLocalMapCounterllP7clmap_tRiPdS1_PK4EdgePKllll
+Round 0
+  alias entry   %20 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 0, !dbg !10308
+  alias entry   %21 = getelementptr inbounds %struct.Edge, %struct.Edge* %6, i64 %19, i32 1, !dbg !10310
+  alias entry   %30 = getelementptr inbounds i64, i64* %7, i64 %29, !dbg !10326
+  alias entry   %37 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %35, i32 0, !dbg !10337
+  alias entry   %45 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, !dbg !10348
+  alias entry   %55 = getelementptr inbounds double, double* %4, i64 %49, !dbg !10358
+  alias entry   %61 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 0, !dbg !10364
+  alias entry   %62 = getelementptr inbounds %struct.clmap_t, %struct.clmap_t* %2, i64 %34, i32 1, !dbg !10367
+    alias entry   %68 = bitcast double* %21 to i64*, !dbg !10375
+  alias entry   %71 = getelementptr inbounds double, double* %4, i64 %70, !dbg !10377
+    alias entry   %72 = bitcast double* %71 to i64*, !dbg !10378
+Round 1
+Round end
+    load (1.593750e+01) from %struct.Edge* %6
+    load (7.937500e+00) from %struct.Edge* %6
+    load (1.593750e+01) from i64* %7
+    load (1.593750e+01) from i32* %3
+    load (1.625000e+02) from %struct.clmap_t* %2
+    load (9.937500e+00) from i32* %5
+    load (4.937500e+00) from %struct.Edge* %6
+    load (4.937500e+00) from double* %4
+    store (4.937500e+00) to double* %4
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to %struct.clmap_t* %2
+    store (5.437500e+00) to i32* %3
+    load (1.093750e+01) from i32* %5
+    load (5.437500e+00) from %struct.Edge* %6
+    store (5.437500e+00) to double* %4
+    store (5.437500e+00) to i32* %5
+  Frequency of %struct.clmap_t* %2
+  load: 1.625000e+02		  store: 1.087500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %3
+  load: 1.593750e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 4.937500e+00		  store: 1.037500e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %5
+  load: 2.087500e+01		  store: 5.437500e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %6
+  load: 3.425000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 1.593750e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z27distExecuteLouvainIterationlPKlS0_PK4EdgeS0_PlPKdP4CommS8_dPdi
+Round 0
+  alias entry   %18 = getelementptr inbounds i64, i64* %2, i64 %17, !dbg !10316
+  alias entry   %20 = getelementptr inbounds i64, i64* %4, i64 %0, !dbg !10322
+  alias entry   %23 = getelementptr inbounds i64, i64* %1, i64 %0, !dbg !10329
+  alias entry   %26 = getelementptr inbounds i64, i64* %1, i64 %25, !dbg !10332
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 0, !dbg !10337
+  alias entry   %32 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %22, i32 1, !dbg !10341
+  alias entry   %47 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 0, !dbg !10401
+  alias entry   %48 = getelementptr inbounds %struct.Edge, %struct.Edge* %3, i64 %46, i32 1, !dbg !10403
+  alias entry   %57 = getelementptr inbounds i64, i64* %4, i64 %56, !dbg !10414
+    alias entry   %93 = bitcast double* %48 to i64*, !dbg !10457
+  alias entry   %116 = getelementptr inbounds double, double* %10, i64 %0, !dbg !10470
+  alias entry   %120 = getelementptr inbounds double, double* %6, i64 %0, !dbg !10473
+  alias entry   %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 1, !dbg !10533
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %136, i32 0, !dbg !10534
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 1, !dbg !10533
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %182, i32 0, !dbg !10534
+  alias entry   %230 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 1, !dbg !10572
+    alias entry   %231 = bitcast double* %230 to i64*, !dbg !10573
+  alias entry   %242 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %229, i32 0, !dbg !10575
+  alias entry   %244 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 1, !dbg !10578
+    alias entry   %246 = bitcast double* %244 to i64*, !dbg !10581
+  alias entry   %257 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %22, i32 0, !dbg !10583
+  alias entry   %261 = getelementptr inbounds i64, i64* %5, i64 %0, !dbg !10587
+  alias entry   %264 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 1, !dbg !10533
+  alias entry   %266 = getelementptr inbounds %struct.Comm, %struct.Comm* %7, i64 %263, i32 0, !dbg !10534
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (1.000000e+00) from i64* %4
+    load (1.000000e+00) from i64* %1
+    load (1.000000e+00) from i64* %1
+    load (5.000000e-01) from %struct.Comm* %7
+    load (5.000000e-01) from %struct.Comm* %7
+    load (7.992188e+00) from %struct.Edge* %3
+    load (3.992188e+00) from %struct.Edge* %3
+    load (7.992188e+00) from i64* %4
+    load (2.492188e+00) from %struct.Edge* %3
+    load (2.742188e+00) from %struct.Edge* %3
+    load (5.000000e-01) from double* %10
+    store (5.000000e-01) to double* %10
+    load (5.000000e-01) from double* %6
+    load (1.250000e-01) from %struct.Comm* %7
+    load (1.250000e-01) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+    load (2.500000e-01) from %struct.Comm* %8
+    load (2.500000e-01) from double* %6
+    load (2.500000e-01) from %struct.Comm* %8
+    store (1.000000e+00) to i64* %5
+    load (4.992188e+00) from %struct.Comm* %7
+    load (4.992188e+00) from %struct.Comm* %7
+  Frequency of i64* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %3
+  load: 1.721875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 8.992188e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 0.000000e+00		  store: 1.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 7.500000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %7
+  load: 2.121875e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 5.000000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 5.000000e-01		  store: 5.000000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21distComputeModularityRK5GraphP4CommPKddi
+Round 0
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10288
+  alias entry   %16 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10304
+    base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+    base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+    base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+    base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Round 1
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 2
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Round 4
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.7
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..8
+Round 0
+  alias entry   %39 = getelementptr inbounds double, double* %6, i64 %38, !dbg !10318
+  alias entry   %42 = getelementptr inbounds %struct.Comm, %struct.Comm* %8, i64 %38, i32 1, !dbg !10321
+    alias entry   %62 = bitcast double* %5 to i64*, !dbg !10329
+    alias entry   %74 = bitcast double* %7 to i64*, !dbg !10329
+Round 1
+Round end
+    load (1.010526e+01) from double* %6
+    load (1.010526e+01) from %struct.Comm* %8
+    load (2.105263e-01) from double* %5
+    store (2.105263e-01) to double* %5
+    load (2.105263e-01) from double* %7
+    store (2.105263e-01) to double* %7
+    load (2.105263e-01) from double* %5
+    load (2.105263e-01) from double* %7
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %6
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %7
+  load: 4.210526e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %8
+  load: 1.010526e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.9
+Round 0
+    alias entry   %3 = bitcast i8* %1 to double**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to double**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to double**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to double**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..10
+Round 0
+    alias entry   %65 = bitcast double* %3 to i64*, !dbg !10310
+    alias entry   %77 = bitcast double* %5 to i64*, !dbg !10310
+Round 1
+Round end
+    load (2.916667e-01) from double* %3
+    store (2.916667e-01) to double* %3
+    load (2.916667e-01) from double* %5
+    store (2.916667e-01) to double* %5
+    load (3.333333e-01) from double* %3
+    load (3.333333e-01) from double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 6.250000e-01		  store: 2.916667e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z20distUpdateLocalCinfolP4CommPKS_
+Round 0
+    base alias entry   %15 = bitcast i8** %14 to %struct.Comm**, !dbg !10269
+    base alias entry   %17 = bitcast i8** %16 to %struct.Comm**, !dbg !10269
+    base alias entry   %20 = bitcast i8** %19 to %struct.Comm**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %19 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %9 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %14 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of %struct.Comm* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..13
+Round 0
+  alias entry   %33 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, !dbg !10304
+  alias entry   %34 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %25, i32 1, !dbg !10304
+  alias entry   %35 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, !dbg !10304
+  alias entry   %36 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %25, i32 1, !dbg !10304
+  alias entry   %37 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %26, i32 1, !dbg !10304
+  alias entry   %38 = getelementptr %struct.Comm, %struct.Comm* %5, i64 %29, !dbg !10304
+  alias entry   %39 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %26, i32 1, !dbg !10304
+  alias entry   %40 = getelementptr %struct.Comm, %struct.Comm* %6, i64 %29, !dbg !10304
+    alias entry   %41 = bitcast double* %36 to %struct.Comm*, !dbg !10304
+    alias entry   %43 = bitcast double* %34 to %struct.Comm*, !dbg !10304
+    alias entry   %46 = bitcast %struct.Comm* %40 to double*, !dbg !10304
+    alias entry   %48 = bitcast %struct.Comm* %38 to double*, !dbg !10304
+  alias entry   %63 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %57, i32 0, !dbg !10304
+  alias entry   %64 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %58, i32 0, !dbg !10304
+  alias entry   %65 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %59, i32 0, !dbg !10304
+  alias entry   %66 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %60, i32 0, !dbg !10304
+  alias entry   %67 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %61, i32 0, !dbg !10304
+  alias entry   %68 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %62, i32 0, !dbg !10304
+    alias entry   %69 = bitcast i64* %63 to <4 x i64>*, !dbg !10304
+    alias entry   %70 = bitcast i64* %64 to <4 x i64>*, !dbg !10304
+    alias entry   %71 = bitcast i64* %65 to <4 x i64>*, !dbg !10304
+    alias entry   %72 = bitcast i64* %66 to <4 x i64>*, !dbg !10304
+    alias entry   %73 = bitcast i64* %67 to <4 x i64>*, !dbg !10304
+    alias entry   %74 = bitcast i64* %68 to <4 x i64>*, !dbg !10304
+  alias entry   %93 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 0, !dbg !10307
+  alias entry   %94 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 0, !dbg !10307
+  alias entry   %95 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 0, !dbg !10307
+  alias entry   %96 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 0, !dbg !10307
+  alias entry   %97 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 0, !dbg !10307
+  alias entry   %98 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 0, !dbg !10307
+    alias entry   %99 = bitcast i64* %93 to <4 x i64>*, !dbg !10307
+    alias entry   %100 = bitcast i64* %94 to <4 x i64>*, !dbg !10307
+    alias entry   %101 = bitcast i64* %95 to <4 x i64>*, !dbg !10307
+    alias entry   %102 = bitcast i64* %96 to <4 x i64>*, !dbg !10307
+    alias entry   %103 = bitcast i64* %97 to <4 x i64>*, !dbg !10307
+    alias entry   %104 = bitcast i64* %98 to <4 x i64>*, !dbg !10307
+  alias entry   %135 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %57, i32 1, !dbg !10309
+  alias entry   %136 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %58, i32 1, !dbg !10309
+  alias entry   %137 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %59, i32 1, !dbg !10309
+  alias entry   %138 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %60, i32 1, !dbg !10309
+  alias entry   %139 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %61, i32 1, !dbg !10309
+  alias entry   %140 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %62, i32 1, !dbg !10309
+  alias entry   %147 = getelementptr inbounds double, double* %135, i64 -1, !dbg !10309
+    alias entry   %148 = bitcast double* %147 to <4 x double>*, !dbg !10309
+  alias entry   %149 = getelementptr inbounds double, double* %136, i64 -1, !dbg !10309
+    alias entry   %150 = bitcast double* %149 to <4 x double>*, !dbg !10309
+  alias entry   %151 = getelementptr inbounds double, double* %137, i64 -1, !dbg !10309
+    alias entry   %152 = bitcast double* %151 to <4 x double>*, !dbg !10309
+  alias entry   %153 = getelementptr inbounds double, double* %138, i64 -1, !dbg !10309
+    alias entry   %154 = bitcast double* %153 to <4 x double>*, !dbg !10309
+  alias entry   %155 = getelementptr inbounds double, double* %139, i64 -1, !dbg !10309
+    alias entry   %156 = bitcast double* %155 to <4 x double>*, !dbg !10309
+  alias entry   %157 = getelementptr inbounds double, double* %140, i64 -1, !dbg !10309
+    alias entry   %158 = bitcast double* %157 to <4 x double>*, !dbg !10309
+  alias entry   %178 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 0, !dbg !10304
+  alias entry   %180 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 0, !dbg !10307
+  alias entry   %183 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %177, i32 1, !dbg !10318
+  alias entry   %185 = getelementptr inbounds %struct.Comm, %struct.Comm* %5, i64 %177, i32 1, !dbg !10309
+Round 1
+Round end
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %6
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    load (2.500000e+00) from %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    store (2.500000e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+    load (9.088235e+00) from %struct.Comm* %6
+    load (9.088235e+00) from %struct.Comm* %5
+    store (9.088235e+00) to %struct.Comm* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %5
+  load: 3.317647e+01		  store: 3.317647e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 3.317647e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..14
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z16distCleanCWandCUlPdP4Comm
+Round 0
+    base alias entry   %17 = bitcast i8** %16 to double**, !dbg !10269
+    base alias entry   %19 = bitcast i8** %18 to double**, !dbg !10269
+    base alias entry   %22 = bitcast i8** %21 to %struct.Comm**, !dbg !10269
+    base alias entry   %24 = bitcast i8** %23 to %struct.Comm**, !dbg !10269
+Round 1
+  base alias entry   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias entry   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+  base alias entry   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias entry   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 2
+  base alias offset entry (1)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %5 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %21 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 2, !dbg !10269
+  base alias offset entry (2)   %6 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %23 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 2, !dbg !10269
+Round 3
+  base alias offset entry (1)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %11 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (2)   %13 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 0, !dbg !10269
+  base alias offset entry (1)   %16 = getelementptr inbounds [3 x i8*], [3 x i8*]* %5, i64 0, i64 1, !dbg !10269
+  base alias offset entry (1)   %18 = getelementptr inbounds [3 x i8*], [3 x i8*]* %6, i64 0, i64 1, !dbg !10269
+Round 4
+Round end
+  Frequency of double* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..18
+Round 0
+  alias entry   %29 = getelementptr inbounds double, double* %5, i64 %28, !dbg !10304
+  alias entry   %30 = getelementptr inbounds %struct.Comm, %struct.Comm* %6, i64 %28, i32 0, !dbg !10309
+    alias entry   %33 = bitcast i64* %30 to i8*, !dbg !10299
+Round 1
+Round end
+    store (1.058333e+01) to double* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %5
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..19
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z21fillRemoteCommunitiesRK5GraphiiRKmS3_RKSt6vectorIlSaIlEES8_S8_S8_S8_RKS4_I4CommSaIS9_EERSt3mapIlS9_St4lessIlESaISt4pairIKlS9_EEERSt13unordered_mapIllSt4hashIlESt8equal_toIlESaISH_ISI_lEEESM_
+Round 0
+  alias entry   %126 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !11433
+  alias entry   %130 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !11449
+  alias entry   %132 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11460
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %197 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %299 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 2, i32 0, !dbg !11792
+    alias entry   %300 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to %"struct.std::__detail::_Hash_node"**, !dbg !11793
+    alias entry   %308 = bitcast %"class.std::unordered_map"* %12 to i8**, !dbg !11836
+  alias entry   %310 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 1, !dbg !11842
+    alias entry   %313 = bitcast %"struct.std::__detail::_Hash_node_base"* %299 to i8*, !dbg !11846
+  alias entry   %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %8, i64 0, i32 0, i32 0, i32 0
+  alias entry   %317 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0
+  alias entry   %318 = getelementptr inbounds %"class.std::unordered_map", %"class.std::unordered_map"* %12, i64 0, i32 0, i32 0
+  alias entry   %319 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %320 = bitcast %"class.std::vector.0"* %319 to i64*
+  alias entry   %321 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %322 = bitcast i64** %321 to i64*
+  alias entry   %325 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0
+  alias entry   %326 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %327 = bitcast %"class.std::vector.0"* %326 to i64*
+  alias entry   %328 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %329 = bitcast i64** %328 to i64*
+  alias entry   %800 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, !dbg !13393
+  alias entry   %801 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13405
+    alias entry   %802 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to %"struct.std::_Rb_tree_node"**, !dbg !13405
+  alias entry   %808 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, !dbg !13419
+  alias entry   %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+    base alias entry   %809 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13425
+  alias entry   %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+    base alias entry   %810 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13435
+  alias entry   %811 = getelementptr inbounds %"class.std::map", %"class.std::map"* %11, i64 0, i32 0, i32 0, i32 2, !dbg !13437
+  alias entry   %812 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, !dbg !13442
+  alias entry   %813 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 1, !dbg !13447
+    alias entry   %814 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to %"struct.std::_Rb_tree_node"**, !dbg !13447
+  alias entry   %820 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, !dbg !13452
+  alias entry   %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+    base alias entry   %821 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !13455
+  alias entry   %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+    base alias entry   %822 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 1, i32 3, !dbg !13462
+  alias entry   %823 = getelementptr inbounds %"class.std::map", %"class.std::map"* %13, i64 0, i32 0, i32 0, i32 2, !dbg !13464
+    alias entry   %828 = bitcast %"struct.std::_Rb_tree_node_base"** %801 to i64*
+    alias entry   %830 = bitcast %"struct.std::_Rb_tree_node_base"* %808 to %"struct.std::_Rb_tree_node"*
+    alias entry   %832 = bitcast %"struct.std::_Rb_tree_node_base"** %813 to i64*
+    alias entry   %834 = bitcast %"struct.std::_Rb_tree_node_base"* %820 to %"struct.std::_Rb_tree_node"*
+    alias entry   %943 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %809, align 8, !dbg !14017, !tbaa !14018
+    alias entry   %998 = load %"struct.std::_Rb_tree_node_base"*, %"struct.std::_Rb_tree_node_base"** %821, align 8, !dbg !14306, !tbaa !14018
+Round 1
+Round end
+    load (1.000000e+00) from i64* %4
+    load (9.999994e-01) from i64* %3
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999963e-01) from %class.Graph* %0
+    load (9.999803e+00) from %"class.std::vector.0"* %6
+    load (1.999960e+01) from %"class.std::vector.0"* %6
+    load (6.249782e+00) from %"class.std::vector.0"* %5
+    load (1.249956e+01) from %"class.std::vector.0"* %5
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (9.999777e-01) from %"class.std::unordered_map"* %12
+    load (1.999809e+01) from %"class.std::vector.0"* %8
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+    load (1.999807e+01) from %"class.std::unordered_map"* %12
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..22
+Round 0
+  alias entry   %31 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from %"class.std::vector.0"* %4
+    load (3.200000e-01) from %"class.std::vector.0"* %6
+    load (1.020000e+01) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 1.020000e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.24
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..25
+Round 0
+  alias entry   %33 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.29"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.27
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..28
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..30
+Round 0
+  alias entry   %20 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %2, i64 0, i32 0, i32 0, i32 0, !dbg !10503
+  alias entry   %34 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %7, i64 0, i32 0, i32 0, i32 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from %"class.std::vector.0"* %2
+    load (2.047500e+02) from %"class.std::vector.0"* %4
+    load (2.047500e+02) from %"class.std::vector.15"* %7
+    load (2.047500e+02) from %"class.std::vector.52"* %6
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %6
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %7
+  load: 2.047500e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z22createCommunityMPITypev
+Round 0
+Round end
+On function _Z23destroyCommunityMPITypev
+Round 0
+Round end
+On function _Z23updateRemoteCommunitiesRK5GraphRSt6vectorI4CommSaIS3_EERKSt3mapIlS3_St4lessIlESaISt4pairIKlS3_EEEii
+Round 0
+  alias entry   %19 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 0, !dbg !10869
+  alias entry   %46 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !11050
+  alias entry   %48 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, i32 2, !dbg !11068
+    alias entry   %49 = bitcast %"struct.std::_Rb_tree_node_base"** %48 to i64*, !dbg !11068
+  alias entry   %51 = getelementptr inbounds %"class.std::map", %"class.std::map"* %2, i64 0, i32 0, i32 0, i32 1, !dbg !11085
+  alias entry   %55 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6
+    alias entry   %56 = bitcast %"class.std::vector.0"* %55 to i64*
+  alias entry   %57 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %58 = bitcast i64** %57 to i64*
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999994e-01) from %class.Graph* %0
+    load (9.999994e-01) from %"class.std::map"* %2
+    load (1.999985e+01) from %class.Graph* %0
+    load (1.999985e+01) from %class.Graph* %0
+  Frequency of %class.Graph* %0
+  load: 4.199970e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::map"* %2
+  load: 9.999994e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..32
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.66", %"class.std::vector.66"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %30 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.137255e-01) from %"class.std::vector.66"* %4
+    load (3.137255e-01) from %"class.std::vector.0"* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.66"* %4
+  load: 3.137255e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.34
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+  alias entry   %7 = getelementptr inbounds i8, i8* %1, i64 8, !dbg !10261
+    alias entry   %8 = bitcast i8* %7 to i64**, !dbg !10261
+  alias entry   %10 = getelementptr inbounds i8, i8* %0, i64 8, !dbg !10261
+    alias entry   %11 = bitcast i8* %10 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 2.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..35
+Round 0
+  alias entry   %36 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+  alias entry   %38 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (3.157895e-01) from %"class.std::vector.0"* %6
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+    load (2.105263e-01) from i64* %5
+    store (2.105263e-01) to i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %6
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..37
+Round 0
+  alias entry   %26 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i64* %2
+    load (6.350000e+00) from %"class.std::vector.52"* %3
+    load (6.350000e+00) from %"class.std::vector.15"* %4
+    load (6.350000e+00) from i64* %5
+    load (2.047500e+02) from i64* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.52"* %3
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.15"* %4
+  load: 6.350000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.111000e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z18exchangeVertexReqsRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ii
+Round 0
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10306
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10319
+  alias entry   %51 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 1, !dbg !10485
+    alias entry   %52 = bitcast i64** %51 to i64*, !dbg !10485
+    alias entry   %54 = bitcast %"class.std::vector.0"* %4 to i64*, !dbg !10489
+  alias entry   %71 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 1, !dbg !10517
+    alias entry   %72 = bitcast i64** %71 to i64*, !dbg !10517
+    alias entry   %74 = bitcast %"class.std::vector.0"* %3 to i64*, !dbg !10518
+    alias entry   %91 = bitcast %"class.std::vector.0"* %3 to i8**
+  alias entry   %94 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+  alias entry   %98 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0, !dbg !10598
+    alias entry   %99 = bitcast %"class.std::vector.0"* %4 to i8**, !dbg !10598
+  alias entry   %128 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 1, !dbg !10673
+    alias entry   %129 = bitcast i64** %128 to i64*, !dbg !10673
+    alias entry   %131 = bitcast %"class.std::vector.0"* %5 to i64*, !dbg !10674
+  alias entry   %147 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 1, !dbg !10696
+    alias entry   %148 = bitcast i64** %147 to i64*, !dbg !10696
+    alias entry   %150 = bitcast %"class.std::vector.0"* %6 to i64*, !dbg !10697
+  alias entry   %190 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 0
+  alias entry   %249 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 0
+  alias entry   %306 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %5, i64 0, i32 0, i32 0, i32 2, !dbg !11244
+  alias entry   %307 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %6, i64 0, i32 0, i32 0, i32 2, !dbg !11245
+    alias entry   %308 = bitcast i64** %306 to i64*, !dbg !11249
+    alias entry   %310 = bitcast i64** %307 to i64*, !dbg !11250
+  alias entry   %316 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 2, !dbg !11279
+  alias entry   %317 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 2, !dbg !11280
+    alias entry   %318 = bitcast i64** %316 to i64*, !dbg !11284
+    alias entry   %320 = bitcast i64** %317 to i64*, !dbg !11285
+Round 1
+Round end
+    load (1.000000e+00) from %class.Graph* %0
+    load (1.000000e+00) from %class.Graph* %0
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+    load (9.999984e-01) from %"class.std::vector.0"* %4
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..39
+Round 0
+  alias entry   %26 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 0, i32 0, i32 0, i32 0
+  alias entry   %27 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 1, i32 0, i32 0, i32 0
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6
+    alias entry   %29 = bitcast %"class.std::vector.0"* %28 to i64*
+  alias entry   %30 = getelementptr inbounds %class.Graph, %class.Graph* %3, i64 0, i32 6, i32 0, i32 0, i32 1
+    alias entry   %31 = bitcast i64** %30 to i64*
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.29", %"class.std::vector.29"* %5, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.988141e+02) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (3.180957e+03) from %class.Graph* %3
+    load (1.590478e+03) from %"class.std::vector.29"* %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %class.Graph* %3
+  load: 9.741684e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.29"* %5
+  load: 1.590478e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp.reduction.reduction_func.41
+Round 0
+    alias entry   %3 = bitcast i8* %1 to i64**, !dbg !10261
+    alias entry   %5 = bitcast i8* %0 to i64**, !dbg !10261
+Round 1
+Round end
+    load (1.000000e+00) from i8* %1
+    load (1.000000e+00) from i8* %0
+  Frequency of i8* %0
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i8* %1
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..42
+Round 0
+  alias entry   %32 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %4, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (1.000000e+00) from i32* %2
+    load (3.157895e-01) from %"class.std::vector.0"* %4
+    load (2.105263e-01) from i64* %3
+    store (2.105263e-01) to i64* %3
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %2
+  load: 1.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 2.105263e-01		  store: 2.105263e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %4
+  load: 3.157895e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi
+Round 0
+  alias entry   %68 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 2, !dbg !11180
+  alias entry   %85 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !11380
+    alias entry   %86 = bitcast i64** %85 to i64*, !dbg !11380
+    alias entry   %88 = bitcast %class.Graph* %2 to i64*, !dbg !11384
+    alias entry   %93 = bitcast %class.Graph* %2 to i8**, !dbg !11392
+  alias entry   %98 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, !dbg !11399
+  alias entry   %99 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !11402
+    alias entry   %100 = bitcast i64** %99 to i64*, !dbg !11402
+    alias entry   %102 = bitcast %"class.std::vector.0"* %98 to i64*, !dbg !11403
+    alias entry   %107 = bitcast %"class.std::vector.0"* %98 to i8**, !dbg !11410
+  alias entry   %112 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, !dbg !11417
+  alias entry   %113 = getelementptr inbounds %class.Graph, %class.Graph* %2, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !11424
+    alias entry   %114 = bitcast %struct.Edge** %113 to i64*, !dbg !11424
+    alias entry   %116 = bitcast %"class.std::vector.5"* %112 to i64*, !dbg !11428
+    alias entry   %121 = bitcast %"class.std::vector.5"* %112 to i8**, !dbg !11440
+Round 1
+Round end
+    load (9.999981e-01) from %class.Graph* %2
+Warning: wrong traversal order, or recursive call
+On function .omp_outlined..45
+Round 0
+Round end
+    call (1.058333e+01, 2.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %5
+    call (1.058333e+01, 1.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %6
+    call (1.058333e+01, 1.721875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Edge* %7
+    call (1.058333e+01, 8.992188e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %8
+    call (1.058333e+01, 0.000000e+00, 1.000000e+00, 0.000000e+00, 0.000000e+00) using i64* %9
+    call (1.058333e+01, 7.500000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using double* %10
+    call (1.058333e+01, 2.121875e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %11
+    call (1.058333e+01, 5.000000e-01, 0.000000e+00, 0.000000e+00, 0.000000e+00) using %struct.Comm* %12
+    call (1.058333e+01, 5.000000e-01, 5.000000e-01, 0.000000e+00, 0.000000e+00) using double* %14
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %5
+  load: 2.116667e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 1.058333e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %7
+  load: 1.822318e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %8
+  load: 9.516732e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %9
+  load: 0.000000e+00		  store: 1.058333e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %10
+  load: 7.937500e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %11
+  load: 2.245651e+02		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %12
+  load: 5.291667e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %14
+  load: 5.291667e+00		  store: 5.291667e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..46
+Round 0
+Round end
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %4
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Edge* %5
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %6
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %7
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %8
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %9
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.Comm* %10
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of double* %12
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_outlined..49
+Round 0
+  alias entry   %28 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %3, i64 0, i32 0, i32 0, i32 0
+Round 1
+Round end
+    load (3.200000e-01) from %"class.std::vector.0"* %3
+    load (3.200000e-01) from i64** %4
+    load (3.200000e-01) from i64** %5
+  Frequency of i32* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i32* %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %3
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %4
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64** %5
+  load: 3.200000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function main
+Round 0
+    base alias entry   %14 = alloca i8**, align 8
+    alias entry   %33 = load i8**, i8*** %14, align 8, !dbg !10342, !tbaa !10335
+Round 1
+Round end
+  Frequency of i8** %1
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGGC2ElP19ompi_communicator_t
+Round 0
+  alias entry   %4 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10266
+  alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+    base alias entry   %5 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10276
+  alias entry   %6 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10279
+    alias entry   %8 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10281, !tbaa !10278
+  alias entry   %9 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10282
+  alias entry   %11 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10284
+  alias entry   %12 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10287
+  alias entry   %36 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10320
+    alias entry   %100 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10478, !tbaa !10278
+    alias entry   %171 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10565, !tbaa !10278
+  alias entry   %183 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2, !dbg !10579
+    alias entry   %190 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %5, align 8, !dbg !10583, !tbaa !10278
+Round 1
+Round end
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    store (3.437500e-01) to %class.GenerateRGG* %0
+    store (2.500000e-01) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (5.000000e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    load (7.656250e-01) from %class.GenerateRGG* %0
+    store (1.000000e+00) to %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (1.000000e+00) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 8.531250e+00		  store: 6.843750e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %struct.ompi_communicator_t* %2
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN11GenerateRGG8generateEbbi
+Round 0
+  alias entry   %27 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 1, !dbg !10306
+  alias entry   %75 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 6, !dbg !10592
+  alias entry   %112 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 7, !dbg !10709
+  alias entry   %153 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 0, !dbg !10828
+  alias entry   %156 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 4, !dbg !10832
+  alias entry   %160 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 5, !dbg !10836
+  alias entry   %362 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !10915
+  alias entry   %696 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 3, !dbg !11101
+  alias entry   %772 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1095 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+  alias entry   %1388 = getelementptr inbounds %class.GenerateRGG, %class.GenerateRGG* %0, i64 0, i32 2
+Round 1
+Round end
+    load (1.000000e+00) from %class.GenerateRGG* %0
+    load (6.249994e-01) from %class.GenerateRGG* %0
+    load (9.999990e-01) from %class.GenerateRGG* %0
+    load (4.999995e-01) from %class.GenerateRGG* %0
+    load (3.124994e-01) from %class.GenerateRGG* %0
+    load (9.999985e-01) from %class.GenerateRGG* %0
+    load (4.999993e-01) from %class.GenerateRGG* %0
+    load (3.124992e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999971e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (9.999962e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999971e-01) from %class.GenerateRGG* %0
+    load (4.999966e-01) from %class.GenerateRGG* %0
+    load (9.999923e-01) from %class.GenerateRGG* %0
+    load (9.999914e-01) from %class.GenerateRGG* %0
+    load (3.749968e-01) from %class.GenerateRGG* %0
+    load (3.749964e-01) from %class.GenerateRGG* %0
+    load (9.999890e-01) from %class.GenerateRGG* %0
+    load (9.998746e-01) from %class.GenerateRGG* %0
+    load (3.199362e+02) from %class.GenerateRGG* %0
+    load (3.199361e+02) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (6.249210e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998736e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998726e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998717e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998707e-01) from %class.GenerateRGG* %0
+    load (9.998698e-01) from %class.GenerateRGG* %0
+    load (4.999349e-01) from %class.GenerateRGG* %0
+    load (2.499674e-01) from %class.GenerateRGG* %0
+    load (7.997451e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (3.998725e+01) from %class.GenerateRGG* %0
+    load (7.997448e+01) from %class.GenerateRGG* %0
+    load (4.999063e-01) from %class.GenerateRGG* %0
+    load (2.499531e-01) from %class.GenerateRGG* %0
+    load (7.996993e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (3.998497e+01) from %class.GenerateRGG* %0
+    load (7.996991e+01) from %class.GenerateRGG* %0
+    load (9.998126e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998116e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998107e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998091e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998082e-01) from %class.GenerateRGG* %0
+    load (9.998072e-01) from %class.GenerateRGG* %0
+    load (9.998015e-01) from %class.GenerateRGG* %0
+    load (6.248724e-01) from %class.GenerateRGG* %0
+    load (6.248718e-01) from %class.GenerateRGG* %0
+    load (1.952724e-01) from %class.GenerateRGG* %0
+    load (3.905445e-01) from %class.GenerateRGG* %0
+    load (3.905442e-01) from %class.GenerateRGG* %0
+    load (6.248393e-01) from %class.GenerateRGG* %0
+    load (1.249644e+01) from %class.GenerateRGG* %0
+    load (1.249643e+01) from %class.GenerateRGG* %0
+    load (1.171538e+00) from %class.GenerateRGG* %0
+    load (5.857690e-01) from %class.GenerateRGG* %0
+    load (2.928845e-01) from %class.GenerateRGG* %0
+    load (1.464422e-01) from %class.GenerateRGG* %0
+    load (6.248387e-01) from %class.GenerateRGG* %0
+    load (6.248381e-01) from %class.GenerateRGG* %0
+    load (1.249638e+01) from %class.GenerateRGG* %0
+    load (6.248253e-01) from %class.GenerateRGG* %0
+    load (3.905154e-01) from %class.GenerateRGG* %0
+    load (2.440719e-01) from %class.GenerateRGG* %0
+    load (6.248247e-01) from %class.GenerateRGG* %0
+    load (4.881438e+00) from %class.GenerateRGG* %0
+    load (9.997431e-01) from %class.GenerateRGG* %0
+    load (9.997421e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (9.997406e-01) from %class.GenerateRGG* %0
+    load (1.999481e+01) from %class.GenerateRGG* %0
+    load (9.997388e-01) from %class.GenerateRGG* %0
+    load (9.997385e-01) from %class.GenerateRGG* %0
+  Frequency of %class.GenerateRGG* %0
+  load: 1.246995e+03		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN14BinaryEdgeList4readEiiiSs
+Round 0
+  alias entry   %39 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 4, !dbg !10380
+  alias entry   %41 = getelementptr inbounds %"class.std::basic_string", %"class.std::basic_string"* %4, i64 0, i32 0, i32 0, !dbg !10388
+  alias entry   %99 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 0, !dbg !10514
+    alias entry   %100 = bitcast %class.BinaryEdgeList* %0 to i8*, !dbg !10515
+  alias entry   %104 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 1, !dbg !10518
+    alias entry   %105 = bitcast i64* %104 to i8*, !dbg !10519
+  alias entry   %118 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 2, !dbg !10532
+  alias entry   %182 = getelementptr inbounds %class.BinaryEdgeList, %class.BinaryEdgeList* %0, i64 0, i32 3, !dbg !10605
+Round 1
+Round end
+    load (9.999971e-01) from %class.BinaryEdgeList* %0
+    load (9.999971e-01) from %"class.std::basic_string"* %4
+    load (6.249948e-01) from %class.BinaryEdgeList* %0
+    load (9.999905e-01) from %class.BinaryEdgeList* %0
+    store (9.999905e-01) to %class.BinaryEdgeList* %0
+    load (9.999895e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999886e-01) from %class.BinaryEdgeList* %0
+    load (9.999729e-01) from %class.BinaryEdgeList* %0
+    store (9.999729e-01) to %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999714e-01) from %class.BinaryEdgeList* %0
+    load (9.999547e-01) from %class.BinaryEdgeList* %0
+    load (1.999909e+01) from %class.BinaryEdgeList* %0
+  Frequency of %class.BinaryEdgeList* %0
+  load: 2.962391e+01		  store: 1.999963e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::basic_string"* %4
+  load: 9.999971e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt8_Rb_treeIlSt4pairIKl4CommESt10_Select1stIS3_ESt4lessIlESaIS3_EE8_M_eraseEPSt13_Rb_tree_nodeIS3_E
+Round 0
+Round end
+Warning: wrong traversal order, or recursive call
+On function _ZN5GraphC2EllllP19ompi_communicator_t
+Round 0
+  alias entry   %8 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, !dbg !10272
+  alias entry   %9 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, !dbg !10272
+  alias entry   %10 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 2, !dbg !10309
+    alias entry   %11 = bitcast %class.Graph* %0 to i8*, !dbg !10309
+  alias entry   %12 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 3, !dbg !10320
+  alias entry   %13 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 4, !dbg !10322
+  alias entry   %14 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 5, !dbg !10324
+  alias entry   %15 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, !dbg !10272
+    alias entry   %16 = bitcast %"class.std::vector.0"* %15 to i8*, !dbg !10332
+  alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+    base alias entry   %17 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 7, !dbg !10334
+  alias entry   %18 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 9, !dbg !10336
+    alias entry   %21 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %17, align 8, !dbg !10338, !tbaa !10335
+  alias entry   %22 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 8, !dbg !10339
+  alias entry   %28 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 1, !dbg !10361
+    alias entry   %29 = bitcast i64** %28 to i64*, !dbg !10361
+    alias entry   %31 = bitcast %class.Graph* %0 to i64*, !dbg !10365
+  alias entry   %45 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 1, i32 0, i32 0, i32 1, !dbg !10416
+    alias entry   %46 = bitcast %struct.Edge** %45 to i64*, !dbg !10416
+    alias entry   %48 = bitcast %"class.std::vector.5"* %9 to i64*, !dbg !10420
+  alias entry   %64 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 6, i32 0, i32 0, i32 1, !dbg !10455
+    alias entry   %65 = bitcast i64** %64 to i64*, !dbg !10455
+    alias entry   %67 = bitcast %"class.std::vector.0"* %15 to i64*, !dbg !10456
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0
+  alias entry   %110 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %15, i64 0, i32 0, i32 0, i32 0, !dbg !10511
+  alias entry   %116 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10547
+  alias entry   %122 = getelementptr inbounds %class.Graph, %class.Graph* %0, i64 0, i32 0, i32 0, i32 0, i32 0, !dbg !10576
+Round 1
+Round end
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    store (1.000000e+00) to %class.Graph* %0
+    load (9.999990e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+    load (9.999980e-01) from %class.Graph* %0
+Warning: wrong traversal order, or recursive call
+On function _ZN3LCGC2EjPdlP19ompi_communicator_t
+Round 0
+  alias entry   %6 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 3, !dbg !10268
+  alias entry   %7 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10277
+  alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+    base alias entry   %8 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 6, !dbg !10279
+  alias entry   %9 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, !dbg !10281
+    alias entry   %10 = bitcast %"class.std::vector.0"* %9 to i8*, !dbg !10300
+  alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+    base alias entry   %11 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0, !dbg !10302
+  alias entry   %12 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10306
+    alias entry   %15 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10308, !tbaa !10305
+  alias entry   %16 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10309
+  alias entry   %20 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 1, !dbg !10326
+    alias entry   %21 = bitcast i64** %20 to i64*, !dbg !10326
+    alias entry   %23 = bitcast %"class.std::vector.0"* %9 to i64*, !dbg !10330
+  alias entry   %42 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10359
+  alias entry   %45 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %9, i64 0, i32 0, i32 0, i32 0, !dbg !10374
+  alias entry   %52 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10399
+    alias entry   %53 = bitcast i64* %52 to i8*, !dbg !10400
+    alias entry   %54 = load %struct.ompi_communicator_t*, %struct.ompi_communicator_t** %11, align 8, !dbg !10401, !tbaa !10305
+Round 1
+Round end
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    store (1.000000e+00) to %class.LCG* %0
+    load (9.999989e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+    load (9.999982e-01) from %class.LCG* %0
+Warning: wrong traversal order, or recursive call
+On function _ZNSt24uniform_int_distributionIiEclISt26linear_congruential_engineImLm16807ELm0ELm2147483647EEEEiRT_RKNS0_10param_typeE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 1, !dbg !10267
+  alias entry   %8 = getelementptr inbounds %"struct.std::uniform_int_distribution<int>::param_type", %"struct.std::uniform_int_distribution<int>::param_type"* %2, i64 0, i32 0, !dbg !10279
+  alias entry   %19 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %37 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0
+  alias entry   %51 = getelementptr inbounds %"class.std::linear_congruential_engine", %"class.std::linear_congruential_engine"* %1, i64 0, i32 0, !dbg !10376
+Round 1
+Round end
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (1.000000e+00) from %"struct.std::uniform_int_distribution<int>::param_type"* %2
+    load (5.000000e-01) from %"class.std::linear_congruential_engine"* %1
+    store (5.000000e-01) to %"class.std::linear_congruential_engine"* %1
+Warning: wrong traversal order, or recursive call
+On function _ZNSt6vectorIlSaIlEEaSERKS1_
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 1, !dbg !10278
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10278
+    alias entry   %8 = bitcast %"class.std::vector.0"* %1 to i64*, !dbg !10285
+  alias entry   %12 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10294
+    alias entry   %13 = bitcast i64** %12 to i64*, !dbg !10294
+    alias entry   %15 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10296
+  alias entry   %.phi.trans.insert = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %35 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10460
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10490
+    alias entry   %43 = bitcast i64** %42 to i64*, !dbg !10490
+  alias entry   %54 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %1, i64 0, i32 0, i32 0, i32 0, !dbg !10573
+  alias entry   %74 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10633
+  alias entry   %77 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10635
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %1
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    store (6.250000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.578125e+00		  store: 1.250000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"class.std::vector.0"* %1
+  load: 1.445312e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIlSaIlEE14_M_fill_insertEN9__gnu_cxx17__normal_iteratorIPlS1_EEmRKl
+Round 0
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10281
+    alias entry   %9 = bitcast i64** %8 to i64*, !dbg !10281
+  alias entry   %11 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10288
+    alias entry   %12 = bitcast i64** %11 to i64*, !dbg !10288
+    alias entry   %543 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10728
+  alias entry   %729 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10820
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (9.765625e-02) from %"class.std::vector.0"* %0
+    store (1.562500e-01) to %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from i64* %3
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.382812e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of i64* %3
+  load: 6.250000e-01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4EdgeSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast %struct.Edge** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %83 = bitcast %"class.std::vector.5"* %0 to i64*, !dbg !10375
+  alias entry   %104 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %111 = getelementptr inbounds %"class.std::vector.5", %"class.std::vector.5"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10431
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (6.250000e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (1.953125e-01) from %"class.std::vector.5"* %0
+    load (3.125000e-01) from %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+    store (3.125000e-01) to %"class.std::vector.5"* %0
+  Frequency of %"class.std::vector.5"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZN3LCG18parallel_prefix_opEv
+Round 0
+  alias entry   %10 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 4, !dbg !10283
+  alias entry   %168 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 1, !dbg !10362
+  alias entry   %174 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 2, !dbg !10269
+  alias entry   %178 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 0
+  alias entry   %186 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 5, !dbg !10373
+  alias entry   %250 = getelementptr inbounds %class.LCG, %class.LCG* %0, i64 0, i32 7, i32 0, i32 0, i32 0, !dbg !10373
+Round 1
+Round end
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (2.005882e+01) from %class.LCG* %0
+    load (1.000000e+00) from %class.LCG* %0
+  Frequency of %class.LCG* %0
+  load: 8.523529e+01		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %6 = bitcast %struct.EdgeTuple** %5 to i64*, !dbg !10273
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %60 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10369
+  alias entry   %81 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %88 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10425
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_E_ET_SC_SC_T0_St26random_access_iterator_tag
+Round 0
+Round end
+On function _ZNSt6vectorI9EdgeTupleSaIS0_EE15_M_range_insertIN9__gnu_cxx17__normal_iteratorIPS0_S2_EEEEvS7_T_S8_St20forward_iterator_tag
+Round 0
+  alias entry   %13 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10344
+    alias entry   %14 = bitcast %struct.EdgeTuple** %13 to i64*, !dbg !10344
+  alias entry   %16 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10351
+    alias entry   %17 = bitcast %struct.EdgeTuple** %16 to i64*, !dbg !10351
+    alias entry   %116 = bitcast %"class.std::vector.84"* %0 to i64*, !dbg !10799
+  alias entry   %137 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %142 = getelementptr inbounds %"class.std::vector.84", %"class.std::vector.84"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10851
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (6.250000e-01) from %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (9.765625e-02) from %"class.std::vector.84"* %0
+    store (1.562500e-01) to %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (1.953125e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    load (3.125000e-01) from %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+    store (3.125000e-01) to %"class.std::vector.84"* %0
+  Frequency of %"class.std::vector.84"* %0
+  load: 2.675781e+00		  store: 1.406250e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZSt16__introsort_loopIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_T1_
+Round 0
+Round end
+On function _ZSt22__final_insertion_sortIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_T0_
+Round 0
+Round end
+On function _ZSt13__heap_selectIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_T0_
+Round 0
+Round end
+On function _ZSt13__adjust_heapIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEElS2_ZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_T0_SD_T1_T2_
+Round 0
+Round end
+On function _ZSt22__move_median_to_firstIN9__gnu_cxx17__normal_iteratorIP9EdgeTupleSt6vectorIS2_SaIS2_EEEEZN11GenerateRGG8generateEbbiEUlRKS2_SA_E_EvT_SC_SC_SC_T0_
+Round 0
+Round end
+On function _ZNSt6vectorIlSaIlEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast i64** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast i64** %8 to i64*, !dbg !10380
+    alias entry   %21 = bitcast %"class.std::vector.0"* %0 to i64*, !dbg !10381
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.0", %"class.std::vector.0"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.0"* %0 to i8**, !dbg !10628
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (6.250000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (3.125000e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    load (1.953125e-01) from %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+    store (3.125000e-01) to %"class.std::vector.0"* %0
+  Frequency of %"class.std::vector.0"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorIdSaIdEE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10274
+    alias entry   %6 = bitcast double** %5 to i64*, !dbg !10274
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10281
+    alias entry   %20 = bitcast double** %8 to i64*, !dbg !10381
+    alias entry   %21 = bitcast %"class.std::vector.10"* %0 to i64*, !dbg !10382
+  alias entry   %42 = getelementptr inbounds %"class.std::vector.10", %"class.std::vector.10"* %0, i64 0, i32 0, i32 0, i32 0
+    alias entry   %65 = bitcast %"class.std::vector.10"* %0 to i8**, !dbg !10630
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (6.250000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (3.125000e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    load (1.953125e-01) from %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+    store (3.125000e-01) to %"class.std::vector.10"* %0
+  Frequency of %"class.std::vector.10"* %0
+  load: 2.265625e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI4CommSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10460
+    alias entry   %6 = bitcast %struct.Comm** %5 to i64*, !dbg !10460
+  alias entry   %8 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10467
+    alias entry   %20 = bitcast %"class.std::vector.15"* %0 to i64*, !dbg !10551
+  alias entry   %41 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %48 = getelementptr inbounds %"class.std::vector.15", %"class.std::vector.15"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10607
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (6.250000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (1.953125e-01) from %"class.std::vector.15"* %0
+    load (3.125000e-01) from %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+    store (3.125000e-01) to %"class.std::vector.15"* %0
+  Frequency of %"class.std::vector.15"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt27__uninitialized_default_n_1ILb0EE18__uninit_default_nIPSt13unordered_setIlSt4hashIlESt8equal_toIlESaIlEEmEEvT_T0_
+Round 0
+Round end
+  Frequency of %"class.std::unordered_set"* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIlSt4pairIKllESaIS2_ENSt8__detail10_Select1stESt8equal_toIlESt4hashIlENS4_18_Mod_range_hashingENS4_20_Default_ranged_hashENS4_20_Prime_rehash_policyENS4_17_Hashtable_traitsILb0ELb0ELb1EEEE21_M_insert_unique_nodeEmmPNS4_10_Hash_nodeIS2_Lb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10509
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10529, !tbaa !10511
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10530
+    alias entry   %76 = bitcast %"class.std::_Hashtable"* %0 to i8**, !dbg !10550
+    alias entry   %82 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i8*, !dbg !10618
+  alias entry   %86 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10627
+    alias entry   %94 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10628
+    base alias entry   %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10630
+  alias entry   %98 = getelementptr inbounds %"class.std::_Hashtable", %"class.std::_Hashtable"* %0, i64 0, i32 2, i32 0, !dbg !10639
+    alias entry   %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10640
+  alias entry   %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, !dbg !10641
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node", %"struct.std::__detail::_Hash_node"* %3, i64 0, i32 0, i32 0, !dbg !10641
+    alias entry   %103 = bitcast %"struct.std::__detail::_Hash_node"* %3 to i64*, !dbg !10642
+  alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645
+    base alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10645
+    base alias entry   %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10676
+    base alias entry   %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10678
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10509, !tbaa !10511
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10525
+  base alias offset entry (0)   %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10629, !tbaa !10511
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable"* %0
+  Frequency of %"class.std::_Hashtable"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt10_HashtableIllSaIlENSt8__detail9_IdentityESt8equal_toIlESt4hashIlENS1_18_Mod_range_hashingENS1_20_Default_ranged_hashENS1_20_Prime_rehash_policyENS1_17_Hashtable_traitsILb0ELb1ELb1EEEE21_M_insert_unique_nodeEmmPNS1_10_Hash_nodeIlLb0EEE
+Round 0
+  alias entry   %5 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, !dbg !10268
+  alias entry   %6 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 4, i32 1, !dbg !10275
+  alias entry   %8 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 1, !dbg !10282
+  alias entry   %10 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 3, !dbg !10288
+  alias entry   %17 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0
+  alias entry   %29 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10428
+    alias entry   %30 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to %"struct.std::__detail::_Hash_node.61"**, !dbg !10429
+  alias entry   %32 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %29, i64 0, i32 0, !dbg !10432
+    alias entry   %35 = bitcast %"struct.std::__detail::_Hash_node_base"* %29 to i64*
+    base alias entry   %44 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %28, i64 %43, !dbg !10469
+    alias entry   %61 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10489, !tbaa !10471
+  alias entry   %62 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %61, i64 0, i32 0, !dbg !10490
+    alias entry   %76 = bitcast %"class.std::_Hashtable.34"* %0 to i8**, !dbg !10510
+    alias entry   %82 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i8*, !dbg !10578
+  alias entry   %86 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 0, !dbg !10296
+  alias entry   %93 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10587
+    alias entry   %94 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10588
+    base alias entry   %96 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %95, i64 0, i32 0, !dbg !10590
+  alias entry   %98 = getelementptr inbounds %"class.std::_Hashtable.34", %"class.std::_Hashtable.34"* %0, i64 0, i32 2, i32 0, !dbg !10599
+    alias entry   %99 = bitcast %"struct.std::__detail::_Hash_node_base"* %98 to i64*, !dbg !10600
+  alias entry   %101 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, !dbg !10601
+  alias entry   %102 = getelementptr inbounds %"struct.std::__detail::_Hash_node.61", %"struct.std::__detail::_Hash_node.61"* %3, i64 0, i32 0, i32 0, !dbg !10601
+    alias entry   %103 = bitcast %"struct.std::__detail::_Hash_node.61"* %3 to i64*, !dbg !10602
+  alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605
+    base alias entry   %104 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base", %"struct.std::__detail::_Hash_node_base"* %98, i64 0, i32 0, !dbg !10605
+    base alias entry   %113 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %84, i64 %112, !dbg !10630
+    base alias entry   %117 = getelementptr inbounds %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %116, i64 %85, !dbg !10632
+Round 1
+Warning: the first offset is not constant
+    alias entry   %45 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %44, align 8, !dbg !10469, !tbaa !10471
+    alias entry   %57 = bitcast %"struct.std::__detail::_Hash_node_base"* %45 to i64*, !dbg !10485
+  base alias offset entry (0)   %95 = load %"struct.std::__detail::_Hash_node_base"*, %"struct.std::__detail::_Hash_node_base"** %87, align 8, !dbg !10589, !tbaa !10471
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round 2
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Warning: the first offset is not constant
+Round end
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (1.000000e+00) from %"class.std::_Hashtable.34"* %0
+    load (5.000000e-01) from %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749996e+00) from %"class.std::_Hashtable.34"* %0
+    store (3.749996e+00) to %"class.std::_Hashtable.34"* %0
+    load (6.249994e+00) from %"class.std::_Hashtable.34"* %0
+    store (6.249994e+00) to %"class.std::_Hashtable.34"* %0
+    store (4.768372e-07) to %"class.std::_Hashtable.34"* %0
+    load (4.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (4.999995e-01) to %"class.std::_Hashtable.34"* %0
+    store (6.249997e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    load (3.749998e-01) from %"class.std::_Hashtable.34"* %0
+    store (3.749998e-01) to %"struct.std::__detail::_Hash_node.61"* %3
+    store (3.749998e-01) to %"class.std::_Hashtable.34"* %0
+    load (3.749998e-01) from %"struct.std::__detail::_Hash_node.61"* %3
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (2.343749e-01) from %"class.std::_Hashtable.34"* %0
+    load (9.999995e-01) from %"class.std::_Hashtable.34"* %0
+    store (9.999995e-01) to %"class.std::_Hashtable.34"* %0
+  Frequency of %"class.std::_Hashtable.34"* %0
+  load: 1.634374e+01		  store: 1.287499e+01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+  Frequency of %"struct.std::__detail::_Hash_node.61"* %3
+  load: 3.749998e-01		  store: 9.999995e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _ZNSt6vectorI8CommInfoSaIS0_EE17_M_default_appendEm
+Round 0
+  alias entry   %7 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 2, !dbg !10273
+    alias entry   %8 = bitcast %struct.CommInfo** %7 to i64*, !dbg !10273
+  alias entry   %10 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 1, !dbg !10280
+    alias entry   %54 = bitcast %struct.CommInfo** %10 to i64*, !dbg !10394
+    alias entry   %55 = bitcast %"class.std::vector.52"* %0 to i64*, !dbg !10395
+  alias entry   %76 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0
+  alias entry   %84 = getelementptr inbounds %"class.std::vector.52", %"class.std::vector.52"* %0, i64 0, i32 0, i32 0, i32 0, !dbg !10449
+    alias entry   %133 = bitcast %"class.std::vector.52"* %0 to i8**, !dbg !10651
+Round 1
+Round end
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (6.250000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (1.953125e-01) from %"class.std::vector.52"* %0
+    load (3.125000e-01) from %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+    store (3.125000e-01) to %"class.std::vector.52"* %0
+  Frequency of %"class.std::vector.52"* %0
+  load: 2.578125e+00		  store: 6.250000e-01 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function _GLOBAL__sub_I_main.cpp
+Round 0
+Round end
+On function .omp_offloading.descriptor_unreg
+Round 0
+Round end
+  Frequency of i8* %0
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 0.000000e+00 (target)
+On function .omp_offloading.descriptor_reg.nvptx64-nvidia-cuda
+Round 0
+Round end
+  ---- Identify Target Regions ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+  ---- Target Distance Calculation ----
+_Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi converges after 3 iterations
+target 0: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 1: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 2: (0: 1.000000e+00) (1: 1.000000e+00) (2: 1.000000e+00) (3: 1.000000e+00) (4: 1.000000e+00) (5: 1.000000e+00) 
+target 3: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 9.152967e+00) (4: 1.000095e+00) (5: 2.000190e+00) 
+target 4: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 8.152880e+00) (4: 9.091440e+00) (5: 1.000095e+00) 
+target 5: (0: 1.010000e+02) (1: 1.010000e+02) (2: 1.010000e+02) (3: 7.152791e+00) (4: 8.091353e+00) (5: 9.029914e+00) 
+  ---- OMP (/tmp/main-cdf4fe.bc, powerpc64le-unknown-linux-gnu) ----
+new entry   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+new entry   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+new entry   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+new entry   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+new entry   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+new entry   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+new entry   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+new entry   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+new entry   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Round 0
+  base alias entry   %130 = bitcast i64** %29 to i8**, !dbg !11450
+  base alias entry   %142 = bitcast i64** %30 to i8**, !dbg !11479
+  alias entry   %147 = bitcast i8* %145 to %struct.Comm*, !dbg !11487
+  alias entry   %158 = bitcast i8* %156 to double*, !dbg !11511
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias entry   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias entry   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias entry   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias entry   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias entry   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias entry   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias entry   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias entry   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias entry   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias entry   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias entry   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias entry   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias entry   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias entry   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias entry   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias entry   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry (func arg) %struct.Comm* %1
+  alias entry (func arg) double* %2
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 1
+Round 1
+  base alias entry   %35 = bitcast i8** %34 to double**, !dbg !10317
+  base alias entry   %37 = bitcast i8** %36 to double**, !dbg !10317
+  base alias entry   %45 = bitcast i8** %44 to %struct.Comm**, !dbg !10317
+  base alias entry   %47 = bitcast i8** %46 to %struct.Comm**, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %29 = alloca i64*, align 8
+  base alias entry   %30 = alloca i64*, align 8
+  base alias offset entry (1)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (1)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (2)   %16 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %192 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 2
+  base alias offset entry (2)   %17 = alloca [3 x i8*], align 8
+  base alias offset entry (-1)   %193 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 2
+  base alias offset entry (1)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (1)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (2)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (-1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (-3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (-3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (-4)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (-4)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (6)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-3)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (6)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-5)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (7)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-5)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-4)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (7)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-6)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-5)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (8)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-6)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-5)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-1)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (8)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-7)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-6)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-1)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-8)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-7)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-6)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-3)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-2)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-8)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-7)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-3)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-2)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (10)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-8)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-7)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-4)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-3)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-2)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (10)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-9)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-8)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-4)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-3)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-2)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-10)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-9)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-8)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-5)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-4)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-3)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-1)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-10)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-9)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-8)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-5)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-4)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-3)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-1)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+  alias entry   %263 = load i64*, i64** %29, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %264 = load i64*, i64** %30, align 8, !dbg !11584, !tbaa !11451
+  alias entry   %274 = ptrtoint i64* %263 to i64, !dbg !11584
+  alias entry   %275 = ptrtoint i64* %264 to i64, !dbg !11584
+  base alias entry   %215 = bitcast i8** %214 to i64*
+  base alias entry   %217 = bitcast i8** %216 to i64*
+  base alias entry   %220 = bitcast i8** %219 to i64*
+  base alias entry   %222 = bitcast i8** %221 to i64*
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 2
+Warning: reach to function declaration __kmpc_fork_call
+Round 2
+  base alias entry   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias entry   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias entry   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias entry   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (2)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (2)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (-1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+  base alias offset entry (4)   %11 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %44 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 4, !dbg !10317
+  base alias offset entry (4)   %12 = alloca [5 x i8*], align 8
+  base alias offset entry (-2)   %46 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 4, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias entry   %126 = bitcast i64** %29 to i8*, !dbg !11447
+  base alias entry   %139 = bitcast i64** %30 to i8*, !dbg !11477
+  base alias offset entry (1)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (2)   %184 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 0
+  base alias offset entry (1)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (2)   %186 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 0
+  base alias offset entry (1)   %189 = getelementptr inbounds [3 x i8*], [3 x i8*]* %16, i64 0, i64 1
+  base alias offset entry (1)   %190 = getelementptr inbounds [3 x i8*], [3 x i8*]* %17, i64 0, i64 1
+  base alias offset entry (1)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (2)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (3)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (6)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (7)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (8)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (10)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (1)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (2)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (6)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (7)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (8)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (10)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (1)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (2)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (5)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (6)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (7)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (9)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (1)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (5)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (6)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (7)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (9)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (1)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (4)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (5)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (6)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (8)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (1)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (4)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (5)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (6)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (8)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (4)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (5)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (7)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (3)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (4)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (5)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (7)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (3)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (4)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (6)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias entry   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (2)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (3)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (4)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (6)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias entry   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+  base alias offset entry (1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (2)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (3)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias entry   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (2)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (3)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (5)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias entry   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (4)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (4)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (1)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (1)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (2)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (2)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (1)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (1)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 3
+Warning: reach to function declaration __kmpc_fork_call
+Round 3
+  base alias offset entry (4)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %24 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 0, !dbg !10317
+  base alias offset entry (4)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (2)   %26 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 0, !dbg !10317
+  base alias offset entry (3)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %29 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 1, !dbg !10317
+  base alias offset entry (3)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (1)   %31 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 1, !dbg !10317
+  base alias offset entry (2)   %34 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 2, !dbg !10317
+  base alias offset entry (2)   %36 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 2, !dbg !10317
+  base alias offset entry (1)   %39 = getelementptr inbounds [5 x i8*], [5 x i8*]* %11, i64 0, i64 3, !dbg !10317
+  base alias offset entry (1)   %41 = getelementptr inbounds [5 x i8*], [5 x i8*]* %12, i64 0, i64 3, !dbg !10317
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (4)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (5)   %31 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %219 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 5
+  base alias offset entry (5)   %32 = alloca [12 x i8*], align 8
+  base alias offset entry (-1)   %221 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 5
+  base alias offset entry (-2)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-1)   %224 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 6
+  base alias offset entry (-2)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-1)   %225 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 6
+  base alias offset entry (-3)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-2)   %227 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 7
+  base alias offset entry (-3)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-2)   %228 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 7
+  base alias offset entry (-4)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-3)   %230 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 8
+  base alias offset entry (-4)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-3)   %231 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 8
+  base alias offset entry (-5)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-4)   %233 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 9
+  base alias offset entry (-5)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-4)   %235 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 9
+  base alias offset entry (-6)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-5)   %238 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 10
+  base alias offset entry (-6)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-5)   %239 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 10
+  base alias offset entry (-7)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-6)   %241 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 11
+  base alias offset entry (-7)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+  base alias offset entry (-6)   %243 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 11
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 4
+Warning: reach to function declaration __kmpc_fork_call
+Round 4
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+  base alias offset entry (4)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (5)   %200 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 0
+  base alias offset entry (4)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (5)   %202 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 0
+  base alias offset entry (3)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (4)   %205 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 1
+  base alias offset entry (3)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (4)   %206 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 1
+  base alias offset entry (2)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (3)   %208 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 2
+  base alias offset entry (2)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (3)   %209 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 2
+  base alias offset entry (1)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (2)   %211 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 3
+  base alias offset entry (1)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (2)   %212 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 3
+  base alias offset entry (1)   %214 = getelementptr inbounds [12 x i8*], [12 x i8*]* %31, i64 0, i64 4
+  base alias offset entry (1)   %216 = getelementptr inbounds [12 x i8*], [12 x i8*]* %32, i64 0, i64 4
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 5
+Warning: reach to function declaration __kmpc_fork_call
+Round 5
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %189, align 8, !dbg !11559
+Warning: store a different alias pointer to a base pointer:   store i8* %156, i8** %190, align 8, !dbg !11559
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Warning: reach to function declaration __kmpc_fork_teams
+Info: add function _Z21distComputeModularityRK5GraphP4CommPKddi to Round 6
+Warning: reach to function declaration __kmpc_fork_call
+Round 6
+Warning: reach to function declaration __kmpc_fork_teams
+Round end
+  ---- Access Frequency Analysis ----
+  target call (1.625206e+01, 0.000000e+00, 5.076920e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625206e+01, 0.000000e+00, 1.015380e+01) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 1.015380e+01, 0.000000e+00) using   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  target call (1.625204e+01, 5.076920e+00, 0.000000e+00) using   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  target call (1.625204e+01, 8.757690e+01, 0.000000e+00) using   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  target call (1.625204e+01, 4.569230e+01, 0.000000e+00) using   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  target call (1.625204e+01, 0.000000e+00, 5.076920e+00) using   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  target call (1.625204e+01, 3.807690e+00, 0.000000e+00) using   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  target call (1.625204e+01, 1.078710e+02, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625204e+01, 2.538460e+00, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  target call (1.625204e+01, 2.538460e+00, 2.538460e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  target call (1.625202e+01, 1.015380e+01, 1.015380e+01) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  target call (1.625202e+01, 1.015380e+01, 0.000000e+00) using   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  call (1.625199e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00) using   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+Frequency of   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Frequency of   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Frequency of   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Frequency of   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Frequency of   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Frequency of   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Frequency of   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+Frequency of   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Frequency of   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+  ---- Optimization Preparation ----
+Rank 9 for   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 6.188273e+01		  store: 0.000000e+00 (target)
+Rank 8 for   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 8.251031e+01		  store: 0.000000e+00 (target)
+Rank 7 for   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 0.000000e+00		  store: 8.251031e+01 (target)
+Rank 6 for   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 4.125515e+01		  store: 4.125515e+01 (target)
+Rank 5 for   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.650200e+02		  store: 0.000000e+00 (target)
+Rank 4 for   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 2.062750e+02		  store: 1.650201e+02 (target)
+Rank 3 for   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 7.425931e+02		  store: 0.000000e+00 (target)
+Rank 2 for   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.423303e+03		  store: 0.000000e+00 (target)
+Rank 1 for   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+  load: 0.000000e+00		  store: 0.000000e+00 (host)
+  load: 1.918144e+03		  store: 2.475302e+02 (target)
+  ---- Data Mapping Optimization ----
+  target call:   %49 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z21distComputeModularityRK5GraphP4CommPKddi_l396.region_id, i32 5, i8** nonnull %24, i8** nonnull %26, i64* nonnull %28, i64* getelementptr inbounds ([5 x i64], [5 x i64]* @.offload_maptypes.0, i64 0, i64 0), i32 0, i32 0), !dbg !10317
+@.offload_maptypes.0 = private unnamed_addr constant [5 x i64] [i64 800, i64 547, i64 1100853829665, i64 547, i64 1102195986465]
+  arg 2 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 1.600380e+02, 1.280304e+03 after adjustment;		    scaled local reuse is 0x500
+    reuse distance is 0x01
+  arg 4 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.600380e+02, 2.560608e+03 after adjustment;		    scaled local reuse is 0xa00
+    reuse distance is 0x01
+  target call:   %24 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %9, i8** nonnull %11, i64* nonnull %13, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.15 = private unnamed_addr constant [3 x i64] [i64 800, i64 35, i64 33]
+  target call:   %26 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %11, i8** nonnull %13, i64* nonnull %15, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20, i64 0, i64 0), i32 0, i32 0), !dbg !10269
+@.offload_maptypes.20 = private unnamed_addr constant [3 x i64] [i64 800, i64 34, i64 34]
+  target call:   %258 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z16distCleanCWandCUlPdP4Comm_l455.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.20.1, i64 0, i64 0), i32 0, i32 0)
+          to label %259 unwind label %319, !dbg !11559
+@.offload_maptypes.20.1 = private unnamed_addr constant [3 x i64] [i64 800, i64 1099553574946, i64 1099681513506]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x01
+  target call:   %276 = call i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z17distLouvainMethodiiRK5GraphRmS2_RSt6vectorIlSaIlEES6_S6_S6_ddRi_l1368.region_id, i32 12, i8** nonnull %200, i8** nonnull %202, i64* nonnull %204, i64* getelementptr inbounds ([12 x i64], [12 x i64]* @.offload_maptypes.47.2, i64 0, i64 0), i32 0, i32 0), !dbg !11584
+@.offload_maptypes.47.2 = private unnamed_addr constant [12 x i64] [i64 800, i64 9895689605153, i64 9895646625825, i64 9897073713185, i64 9895987392545, i64 9895646621730, i64 9895636144161, i64 1101320425505, i64 1099553587235, i64 800, i64 9895646617635, i64 800]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.650200e+02, 0.000000e+00) is   %91 = invoke i8* @omp_target_alloc(i64 %90, i32 signext -100)
+          to label %92 unwind label %291, !dbg !11387
+    size is   %90 = sub i64 %87, %89, !dbg !11386
+    global reuse is 0x05
+    local reuse is 1.015380e+01, 8.123040e+01 after adjustment;		    scaled local reuse is 0x051
+    reuse distance is 0x09
+  arg 2 (0.000000e+00, 0.000000e+00; 8.251031e+01, 0.000000e+00) is   %105 = invoke i8* @omp_target_alloc(i64 %104, i32 signext -100)
+          to label %106 unwind label %295, !dbg !11405
+    size is   %104 = sub i64 %101, %103, !dbg !11404
+    global reuse is 0x08
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 3 (0.000000e+00, 0.000000e+00; 1.423303e+03, 0.000000e+00) is   %119 = invoke i8* @omp_target_alloc(i64 %118, i32 signext -100)
+          to label %120 unwind label %299, !dbg !11431
+    size is   %118 = sub i64 %115, %117, !dbg !11430
+    global reuse is 0x02
+    local reuse is 8.757690e+01, 1.401230e+03 after adjustment;		    scaled local reuse is 0x579
+    reuse distance is 0x09
+  arg 4 (0.000000e+00, 0.000000e+00; 7.425931e+02, 0.000000e+00) is   %128 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %129 unwind label %303, !dbg !11449
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x03
+    local reuse is 4.569230e+01, 3.655384e+02 after adjustment;		    scaled local reuse is 0x16d
+    reuse distance is 0x09
+  arg 5 (0.000000e+00, 0.000000e+00; 0.000000e+00, 8.251031e+01) is   %140 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %141 unwind label %311, !dbg !11478
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x07
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  arg 6 (0.000000e+00, 0.000000e+00; 6.188273e+01, 0.000000e+00) is   %134 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %135 unwind label %307, !dbg !11462
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x09
+    local reuse is 3.807690e+00, 3.046152e+01 after adjustment;		    scaled local reuse is 0x01e
+    reuse distance is 0x09
+  arg 7 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 1.078710e+02, 1.725936e+03 after adjustment;		    scaled local reuse is 0x6bd
+    reuse distance is 0x01
+  arg 8 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 2.538460e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x01
+  arg 10 (0.000000e+00, 0.000000e+00; 4.125515e+01, 4.125515e+01) is   %156 = invoke i8* @omp_target_alloc(i64 %127, i32 signext -100)
+          to label %157 unwind label %317, !dbg !11510
+    size is   %127 = shl i64 %69, 3, !dbg !11448
+    global reuse is 0x06
+    local reuse is 5.076920e+00, 4.061536e+01 after adjustment;		    scaled local reuse is 0x028
+    reuse distance is 0x09
+  target call:   %325 = invoke i32 @__tgt_target_teams(i64 -1, i8* nonnull @.__omp_offloading_33_128194f__Z20distUpdateLocalCinfolP4CommPKS__l436.region_id, i32 3, i8** nonnull %184, i8** nonnull %186, i64* nonnull %188, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @.offload_maptypes.15.3, i64 0, i64 0), i32 0, i32 0)
+          to label %326 unwind label %319, !dbg !11667
+@.offload_maptypes.15.3 = private unnamed_addr constant [3 x i64] [i64 800, i64 7696921137187, i64 7696751280161]
+  arg 1 (0.000000e+00, 0.000000e+00; 1.918144e+03, 2.475302e+02) is   %145 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %146 unwind label %313, !dbg !11486
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x01
+    local reuse is 2.030760e+01, 3.249216e+02 after adjustment;		    scaled local reuse is 0x144
+    reuse distance is 0x07
+  arg 2 (0.000000e+00, 0.000000e+00; 2.062750e+02, 1.650201e+02) is   %151 = invoke i8* @omp_target_alloc(i64 %144, i32 signext -100)
+          to label %152 unwind label %315, !dbg !11502
+    size is   %144 = shl i64 %69, 4, !dbg !11485
+    global reuse is 0x04
+    local reuse is 1.015380e+01, 1.624608e+02 after adjustment;		    scaled local reuse is 0x0a2
+    reuse distance is 0x07
diff --git a/miniVite/main.cpp b/miniVite/main.cpp
new file mode 100644
index 0000000..eb695f3
--- /dev/null
+++ b/miniVite/main.cpp
@@ -0,0 +1,252 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************ 
+
+
+#include <sys/resource.h>
+#include <sys/time.h>
+#include <unistd.h>
+
+#include <cassert>
+#include <cstdlib>
+
+#include <fstream>
+#include <iostream>
+#include <sstream>
+#include <string>
+
+#include <omp.h>
+#include <mpi.h>
+
+//#include "dspl.hpp"
+//#include "dspl_gpu.hpp"
+#include "dspl_gpu_kernel.hpp"
+
+static std::string inputFileName;
+static int me, nprocs;
+static int ranksPerNode = 1;
+static GraphElem nvRGG = 0;
+static bool generateGraph = false;
+static int randomEdgePercent = 0;
+static bool randomNumberLCG = false;
+static bool isUnitEdgeWeight = true;
+static double threshold = 1.0E-6;
+
+// parse command line parameters
+static void parseCommandLine(const int argc, char * const argv[]);
+
+int main(int argc, char *argv[])
+{
+  double t0, t1, t2, t3, ti = 0.0;
+  
+  MPI_Init(&argc, &argv);
+
+  MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
+  MPI_Comm_rank(MPI_COMM_WORLD, &me);
+
+  parseCommandLine(argc, argv);
+
+  createCommunityMPIType();
+  double td0, td1, td, tdt;
+
+  MPI_Barrier(MPI_COMM_WORLD);
+  td0 = MPI_Wtime();
+
+  Graph* g = nullptr;
+
+  // generate graph only supports RGG as of now
+  if (generateGraph) { 
+      GenerateRGG gr(nvRGG);
+      g = gr.generate(randomNumberLCG, isUnitEdgeWeight, randomEdgePercent);
+      //g->print(false);
+
+      if (me == 0) {
+          std::cout << "**********************************************************************" << std::endl;
+          std::cout << "Generated Random Geometric Graph with d: " << gr.get_d() << std::endl;
+#ifndef PRINT_DIST_STATS
+          const GraphElem nv = g->get_nv();
+          const GraphElem ne = g->get_ne();
+          std::cout << "Number of vertices: " << nv << std::endl;
+          std::cout << "Number of edges: " << ne << std::endl;
+#endif
+          //std::cout << "Sparsity: "<< (double)((double)nv / (double)(nvRGG*nvRGG))*100.0 <<"%"<< std::endl;
+          //std::cout << "Average degree: " << (ne / nv) << std::endl;
+      }
+      
+      MPI_Barrier(MPI_COMM_WORLD);
+  }
+  else { // read input graph
+      BinaryEdgeList rm;
+      g = rm.read(me, nprocs, ranksPerNode, inputFileName);
+      //g->print();
+  }
+
+#ifdef PRINT_DIST_STATS 
+  g->print_dist_stats();
+#endif
+  assert(g != nullptr);
+
+  MPI_Barrier(MPI_COMM_WORLD);
+#ifdef DEBUG_PRINTF  
+  assert(g);
+#endif  
+  td1 = MPI_Wtime();
+  td = td1 - td0;
+
+  MPI_Reduce(&td, &tdt, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+ 
+  if (me == 0)  {
+      if (!generateGraph)
+          std::cout << "Time to read input file and create distributed graph (in s): " 
+              << (tdt/nprocs) << std::endl;
+      else
+          std::cout << "Time to generate distributed graph of " 
+              << nvRGG << " vertices (in s): " << (tdt/nprocs) << std::endl;
+  }
+
+  double currMod = -1.0;
+  double prevMod = -1.0;
+  double total = 0.0;
+
+  std::vector<GraphElem> ssizes, rsizes, svdata, rvdata;
+#if defined(USE_MPI_RMA)
+  MPI_Win commwin;
+#endif
+  size_t ssz = 0, rsz = 0;
+  int iters = 0;
+    
+  MPI_Barrier(MPI_COMM_WORLD);
+
+  t1 = MPI_Wtime();
+
+  std::cout << "Size: " << sizeof(Edge) << " : " << sizeof(GraphElem) << std::endl;
+#if defined(USE_MPI_RMA)
+  currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 
+                svdata, rvdata, currMod, threshold, iters, commwin);
+#else
+  currMod = distLouvainMethod(me, nprocs, *g, ssz, rsz, ssizes, rsizes, 
+                svdata, rvdata, currMod, threshold, iters);
+#endif
+  MPI_Barrier(MPI_COMM_WORLD);
+  t0 = MPI_Wtime();
+  
+  if(me == 0) {
+      std::cout << "Modularity: " << currMod << ", Iterations: " 
+          << iters << ", Time (in s): "<<t0-t1<< std::endl;
+
+      std::cout << "**********************************************************************" << std::endl;
+  }
+
+  MPI_Barrier(MPI_COMM_WORLD);
+
+  double tot_time = 0.0;
+  MPI_Reduce(&total, &tot_time, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+  
+  delete g;
+  destroyCommunityMPIType();
+
+  MPI_Finalize();
+
+  return 0;
+} // main
+
+void parseCommandLine(const int argc, char * const argv[])
+{
+  int ret;
+
+  while ((ret = getopt(argc, argv, "f:r:t:n:wlp:")) != -1) {
+    switch (ret) {
+    case 'f':
+      inputFileName.assign(optarg);
+      break;
+    case 'r':
+      ranksPerNode = atoi(optarg);
+      break;
+    case 't':
+      threshold = atof(optarg);
+      break;
+    case 'n':
+      nvRGG = atol(optarg);
+      if (nvRGG > 0)
+          generateGraph = true; 
+      break;
+    case 'w':
+      isUnitEdgeWeight = false;
+      break;
+    case 'l':
+      randomNumberLCG = true;
+      break;
+    case 'p':
+      randomEdgePercent = atoi(optarg);
+      break;
+    default:
+      assert(0 && "Should not reach here!!");
+      break;
+    }
+  }
+
+  if (me == 0 && (argc == 1)) {
+      std::cerr << "Must specify some options." << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  }
+  
+  if (me == 0 && !generateGraph && inputFileName.empty()) {
+      std::cerr << "Must specify a binary file name with -f or provide parameters for generating a graph." << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  }
+   
+  if (me == 0 && !generateGraph && randomNumberLCG) {
+      std::cerr << "Must specify -g for graph generation using LCG." << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  } 
+   
+  if (me == 0 && !generateGraph && randomEdgePercent) {
+      std::cerr << "Must specify -g for graph generation first to add random edges to it." << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  } 
+  
+  if (me == 0 && !generateGraph && !isUnitEdgeWeight) {
+      std::cerr << "Must specify -g for graph generation first before setting edge weights." << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  }
+  
+  if (me == 0 && generateGraph && ((randomEdgePercent < 0) || (randomEdgePercent >= 100))) {
+      std::cerr << "Invalid random edge percentage for generated graph!" << std::endl;
+      MPI_Abort(MPI_COMM_WORLD, -99);
+  }
+} // parseCommandLine
diff --git a/miniVite/miniVite b/miniVite/miniVite
new file mode 100755
index 0000000..3b3b798
Binary files /dev/null and b/miniVite/miniVite differ
diff --git a/miniVite/miniVite_alloc b/miniVite/miniVite_alloc
new file mode 100755
index 0000000..3b3b798
Binary files /dev/null and b/miniVite/miniVite_alloc differ
diff --git a/miniVite/miniVite_noalloc b/miniVite/miniVite_noalloc
new file mode 100755
index 0000000..a0f31ef
Binary files /dev/null and b/miniVite/miniVite_noalloc differ
diff --git a/miniVite/run b/miniVite/run
new file mode 100644
index 0000000..7d738a1
--- /dev/null
+++ b/miniVite/run
@@ -0,0 +1,15 @@
+LIBOMPTARGET_DEBUG=1 LLD_GPU_MODE=SDEV bsub -nnodes 1 -P GEN010SOLLVE -J km -W 120 -q batch -o log jsrun -n 1 -g 6 nvprof ./miniVite -n 50000000
+LLD_GPU_MODE=UM mpirun -n 1 nvprof ./miniVite -n 50000000
+
+grep Time: summit/alloc_032819_large.log | awk '{print $2}' | v2m 11 3 2
+grep "Host To Device" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 4 2 3
+grep "Device To Host" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 5
+grep "Gpu page fault groups" summit/alloc_032819_sm.log | awk '{print $6}' | awk -F "m|s" '{print $1}'
+grep "cuMemPrefetchAsync" summit/alloc_032819_sm.log | awk '{print $2}' | awk -F "m|s" '{print $1}'
+grep "cuMemPrefetchAsync" summit/alloc_032819_sm.log | awk '{print $4}' | awk -F "m|s" '{print $1}'
+grep "cuMemcpyHtoD" summit/alloc_032819_sm.log | awk '{print $4}' | awk -F "m|s" '{print $1}' | v2m 4 4 3
+
+grep "Host To Device" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 11 2 3
+grep "Device To Host" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m" '{print $1}' | v2m 11 2 3
+grep "Gpu page fault groups" summit/alloc_032819_large.log | awk '{print $6}' | awk -F "m|s" '{print $1}'| v2m 11 3 3
+grep "Host To Device" summit/alloc_032819_large.log | awk '{print $5}' | awk -F "G" '{print $1}' | v2m 11 2 3
diff --git a/miniVite/run.sh b/miniVite/run.sh
new file mode 100644
index 0000000..e4b907a
--- /dev/null
+++ b/miniVite/run.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+
+log0="summit/alloc_051919_lru_sm.log"
+log1="summit/alloc_051919_lru_la.log"
+
+cd /ccs/home/lld/apps/miniVite
+
+for(( j=0; j<3; j++ ))
+do
+  bsub -o $log0 submit_sm.lsf
+  sleep 1
+  job_num=`bjob | grep lld | grep mnV | wc -l`
+  while [ $job_num -ne 0 ]
+  do
+    sleep 20
+    job_num=`bjob | grep lld | grep mnV | wc -l`
+  done
+  for(( i=50000000; i<=150000000; i+=10000000 ))
+  do
+    sed "s/input/$i/" < submit_one.lsf > temp.lsf
+    bsub -o $log1 temp.lsf
+    sleep 1
+    job_num=`bjob | grep lld | grep mnV | wc -l`
+    while [ $job_num -ne 0 ]
+    do
+      sleep 20
+      job_num=`bjob | grep lld | grep mnV | wc -l`
+    done
+  done
+done
+
+cd -
diff --git a/miniVite/run2.sh b/miniVite/run2.sh
new file mode 100644
index 0000000..4bb64d5
--- /dev/null
+++ b/miniVite/run2.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+log1="summit/all_032619_1_2.log"
+log2="summit/all_032619_2_2.log"
+
+cd /ccs/home/lld/apps/miniVite
+
+for(( j=0; j<3; j++ ))
+do
+  bsub -o $log1 submit_mid2.lsf
+  bsub -o $log2 submit_hu.lsf
+  sleep 1
+  job_num=`bjob | grep lld | wc -l`
+  while [ $job_num -ne 0 ]
+  do
+    sleep 30
+    job_num=`bjob | grep lld | wc -l`
+  done
+done
+
+cd -
diff --git a/miniVite/stats b/miniVite/stats
new file mode 100644
index 0000000..57aaf9c
--- /dev/null
+++ b/miniVite/stats
@@ -0,0 +1,2 @@
+grep Time: summit/alloc_040419_sm.log | awk '{print $2}' | v2m 4 7 3
+grep Time: summit/alloc_040519_la.log | awk '{print $2}' | v2m 11 6 3
diff --git a/miniVite/submit.sh b/miniVite/submit.sh
new file mode 100644
index 0000000..a039b51
--- /dev/null
+++ b/miniVite/submit.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+log="summit/all_032619.log"
+opt="-nnodes 1 -P GEN010SOLLVE -J km -W 120 -q batch -o $log"
+
+cd /ccs/home/lld/apps/miniVite
+
+for(( j=0; j<3; j++ ))
+do
+  for(( i=5000000; i<=150000000; i+=5000000 ))
+  do
+    LLD_GPU_MODE=UM bsub $opt jsrun -n1 -g6 nvprof ./miniVite -n 10000000
+    sleep 1
+    job_num=`bjob | grep lld | wc -l`
+    while [ $job_num -ne 0 ]
+    do
+      sleep 30
+      job_num=`bjob | grep lld | wc -l`
+    done
+  done
+done
+
+cd -
diff --git a/miniVite/utils.hpp b/miniVite/utils.hpp
new file mode 100644
index 0000000..50337e4
--- /dev/null
+++ b/miniVite/utils.hpp
@@ -0,0 +1,328 @@
+// ***********************************************************************
+//
+//                              miniVite
+//
+// ***********************************************************************
+//
+//       Copyright (2018) Battelle Memorial Institute
+//                      All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+//
+// 1. Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//
+// 2. Redistributions in binary form must reproduce the above copyright
+// notice, this list of conditions and the following disclaimer in the
+// documentation and/or other materials provided with the distribution.
+//
+// 3. Neither the name of the copyright holder nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+// FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+// COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+// BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+// LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+// CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+// ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+// ************************************************************************ 
+
+#pragma once
+#ifndef UTILS_HPP
+#define UTILS_HPP
+
+#define PI                          (3.14159)
+#define MAX_PRINT_NEDGE             (100000)
+
+// Read https://en.wikipedia.org/wiki/Linear_congruential_generator#Period_length
+// about choice of LCG parameters
+// From numerical recipes
+// TODO FIXME investigate larger periods
+#define MLCG                        (2147483647)    // 2^31 - 1
+#define ALCG                        (16807)         // 7^5
+#define BLCG                        (0)
+
+#define SR_UP_TAG                   100
+#define SR_DOWN_TAG                 101
+#define SR_SIZES_UP_TAG             102
+#define SR_SIZES_DOWN_TAG           103
+#define SR_X_UP_TAG                 104
+#define SR_X_DOWN_TAG               105
+#define SR_Y_UP_TAG                 106
+#define SR_Y_DOWN_TAG               107
+#define SR_LCG_TAG                  108
+
+#include <random>
+#include <utility>
+#include <cstring>
+
+#ifdef USE_32_BIT_GRAPH
+using GraphElem = int32_t;
+using GraphWeight = float;
+const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT32_T;
+const MPI_Datatype MPI_WEIGHT_TYPE = MPI_FLOAT;
+#else
+using GraphElem = int64_t;
+using GraphWeight = double;
+const MPI_Datatype MPI_GRAPH_TYPE = MPI_INT64_T;
+const MPI_Datatype MPI_WEIGHT_TYPE = MPI_DOUBLE;
+#endif
+
+extern unsigned seed;
+
+// Is nprocs a power-of-2?
+int is_pwr2(int nprocs) 
+{ return ((nprocs != 0) && !(nprocs & (nprocs - 1))); }
+
+// return unint32_t seed
+GraphElem reseeder(unsigned initseed)
+{
+    std::seed_seq seq({initseed});
+    std::vector<std::uint32_t> seeds(1);
+    seq.generate(seeds.begin(), seeds.end());
+
+    return (GraphElem)seeds[0];
+}
+
+// Local random number generator 
+template<typename T, typename G = std::default_random_engine>
+T genRandom(T lo, T hi)
+{
+    thread_local static G gen(seed);
+    using Dist = typename std::conditional
+        <
+        std::is_integral<T>::value
+        , std::uniform_int_distribution<T>
+        , std::uniform_real_distribution<T>
+        >::type;
+
+    thread_local static Dist utd {};
+    return utd(gen, typename Dist::param_type{lo, hi});
+}
+
+// Parallel Linear Congruential Generator
+// x[i] = (a*x[i-1] + b)%M
+class LCG
+{
+    public:
+        LCG(unsigned seed, GraphWeight* drand, 
+            GraphElem n, MPI_Comm comm = MPI_COMM_WORLD): 
+        seed_(seed), drand_(drand), n_(n)
+        {
+            comm_ = comm;
+            MPI_Comm_size(comm_, &nprocs_);
+            MPI_Comm_rank(comm_, &rank_);
+
+            // allocate long random numbers
+            rnums_.resize(n_);
+
+            // init x0
+            if (rank_ == 0)
+                x0_ = reseeder(seed_);
+
+            // step #1: bcast x0 from root
+            MPI_Bcast(&x0_, 1, MPI_GRAPH_TYPE, 0, comm_);
+            
+            // step #2: parallel prefix to generate first random value per process
+            parallel_prefix_op();
+        }
+        
+        ~LCG() { rnums_.clear(); }
+
+        // matrix-matrix multiplication for 2x2 matrices
+        void matmat_2x2(GraphElem c[], GraphElem a[], GraphElem b[])
+        {
+            for (int i = 0; i < 2; i++) {
+                for (int j = 0; j < 2; j++) {
+                    GraphElem sum = 0;
+                    for (int k = 0; k < 2; k++) {
+                        sum += a[i*2+k]*b[k*2+j];
+                    }
+                    c[i*2+j] = sum;
+                }
+            }
+        }
+
+        // x *= y
+        void matop_2x2(GraphElem x[], GraphElem y[])
+        {
+            GraphElem tmp[4];
+            matmat_2x2(tmp, x, y);
+            memcpy(x, tmp, sizeof(GraphElem[4]));
+        }
+
+        // find kth power of a 2x2 matrix
+        void mat_power(GraphElem mat[], GraphElem k)
+        {
+            GraphElem tmp[4];
+            memcpy(tmp, mat, sizeof(GraphElem[4]));
+
+            // mat-mat multiply k times
+            for (GraphElem p = 0; p < k-1; p++)
+                matop_2x2(mat, tmp);
+        }
+
+        // parallel prefix for matrix-matrix operation
+        // `x0 is the very first random number in the series
+        // `ab is a 2-length array which stores a and b
+        // `n_ is (n/p)
+        // `rnums is n_ length array which stores the random nums for a process
+        void parallel_prefix_op()
+        {
+            GraphElem global_op[4]; 
+            global_op[0] = ALCG;
+            global_op[1] = 0;
+            global_op[2] = BLCG;
+            global_op[3] = 1;
+
+            mat_power(global_op, n_);        // M^(n/p)
+            GraphElem prefix_op[4] = {1,0,0,1};  // I in row-major
+
+            GraphElem global_op_recv[4];
+
+            int steps = (int)(log2((double)nprocs_));
+
+            for (int s = 0; s < steps; s++) {
+                
+                int mate = rank_^(1 << s); // toggle the sth LSB to find my neighbor
+                
+                // send/recv global to/from mate
+                MPI_Sendrecv(global_op, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 
+                        global_op_recv, 4, MPI_GRAPH_TYPE, mate, SR_LCG_TAG, 
+                        comm_, MPI_STATUS_IGNORE);
+
+                matop_2x2(global_op, global_op_recv);   
+                
+                if (mate < rank_) 
+                    matop_2x2(prefix_op, global_op_recv);
+
+                MPI_Barrier(comm_);
+            }
+
+            // populate the first random number entry for each process
+            // (x0*a + b)%P
+            if (rank_ == 0)
+                rnums_[0] = x0_;
+            else
+                rnums_[0] = (x0_*prefix_op[0] + prefix_op[2])%MLCG;
+        }
+
+        // generate random number based on the first 
+        // random number on a process
+        // TODO check the 'quick'n dirty generators to
+        // see if we can avoid the mod
+        void generate()
+        {
+#if defined(PRINT_LCG_LONG_RANDOM_NUMBERS)
+            for (int k = 0; k < nprocs_; k++) {
+                if (k == rank_) {
+                    std::cout << "------------" << std::endl;
+                    std::cout << "Process#" << rank_ << " :" << std::endl;
+                    std::cout << "------------" << std::endl;
+                    std::cout << rnums_[0] << std::endl;
+                    for (GraphElem i = 1; i < n_; i++) {
+                        rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG;
+                        std::cout << rnums_[i] << std::endl;
+                    }
+                }
+                MPI_Barrier(comm_);
+            }
+#else
+            for (GraphElem i = 1; i < n_; i++) {
+                rnums_[i] = (rnums_[i-1]*ALCG + BLCG)%MLCG;
+            }
+#endif
+            GraphWeight mult = 1.0 / (GraphWeight)(1.0 + (GraphWeight)(MLCG-1));
+
+#if defined(PRINT_LCG_DOUBLE_RANDOM_NUMBERS)
+            for (int k = 0; k < nprocs_; k++) {
+                if (k == rank_) {
+                    std::cout << "------------" << std::endl;
+                    std::cout << "Process#" << rank_ << " :" << std::endl;
+                    std::cout << "------------" << std::endl;
+
+                    for (GraphElem i = 0; i < n_; i++) {
+                        drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult ); // 0-1
+                        std::cout << drand_[i] << std::endl;
+                    }
+                }
+                MPI_Barrier(comm_);
+            }
+#else
+            for (GraphElem i = 0; i < n_; i++)
+                drand_[i] = (GraphWeight)((GraphWeight)fabs(rnums_[i]) * mult); // 0-1
+#endif
+        }
+         
+        // copy from drand_[idx_start] to new_drand, 
+        // rescale the random numbers between lo and hi
+        void rescale(GraphWeight* new_drand, GraphElem idx_start, GraphWeight const& lo)
+        {
+            GraphWeight range = (1.0 / (GraphWeight)nprocs_);
+
+#if defined(PRINT_LCG_DOUBLE_LOHI_RANDOM_NUMBERS)
+            for (int k = 0; k < nprocs_; k++) {
+                if (k == rank_) {
+                    std::cout << "------------" << std::endl;
+                    std::cout << "Process#" << rank_ << " :" << std::endl;
+                    std::cout << "------------" << std::endl;
+
+                    for (GraphElem i = idx_start, j = 0; i < n_; i++, j++) {
+                        new_drand[j] = lo + (GraphWeight)(range * drand_[i]);
+                        std::cout << new_drand[j] << std::endl;
+                    }
+                }
+                MPI_Barrier(comm_);
+            }
+#else
+            for (GraphElem i = idx_start, j = 0; i < n_; i++, j++)
+                new_drand[j] = lo + (GraphWeight)(range * drand_[i]); // lo-hi
+#endif
+        }
+
+    private:
+        MPI_Comm comm_;
+        int nprocs_, rank_;
+        unsigned seed_;
+        GraphElem n_, x0_;
+        GraphWeight* drand_;
+        std::vector<GraphElem> rnums_;
+};
+
+// locks
+#ifdef USE_OPENMP_LOCK
+#else
+#ifdef USE_SPINLOCK 
+#include <atomic>
+std::atomic_flag lkd_ = ATOMIC_FLAG_INIT;
+#else
+#include <mutex>
+std::mutex mtx_;
+#endif
+void lock() {
+#ifdef USE_SPINLOCK 
+    while (lkd_.test_and_set(std::memory_order_acquire)) { ; } 
+#else
+    mtx_.lock();
+#endif
+}
+void unlock() { 
+#ifdef USE_SPINLOCK 
+    lkd_.clear(std::memory_order_release); 
+#else
+    mtx_.unlock();
+#endif
+}
+#endif
+
+#endif // UTILS