VersionHistory.txt

Version History for OptQC (Redone_v1+)

v1.0:
- Copied over files from Release version (v0.62F).
- Frustrated with stupidity of Fortran character arrays. Decided to replace the len=20 (i.e. 20 qubit limitation) by means of a new object.
- Created cstr and binstr (extension of cstr) for said purpose in OptQC_Common_Module.f90.
- Removed GrayCodeDistance function from OptQC_Common.f90 - obsolete.
- Ported over QSimilar function to be a member function of binstr - have not deleted the original yet though. Waiting till next version to impose integration with everything else.
- Decided to store current progress as v1.0 - integration of binstr incoming.

v1.01:
- Ported and removed QSimilar and IsEmpty functions into the binstr object.
- Integrated usage of binstr object into CSD code.
- Tested and WARGH SHO MUCH DEBUGGING.
- Verified results against original program by checking the raw decomposition (without optimization), which matches. Numbers for optimized result looks about right - haven't checked validity, but should be.
- Decided to store current progress as v1.01 - next up, changing optimization procedure.

v1.1:
- Added the function binstr_getdecrep to compute the decimal representation of the bitstring object.
- Wrapped contnt of OptQC_Perm in a module because reasons.
- Reshuffled some code between the main files with the bridge file.
- Wrote up subroutines to generate random permutation gates and to find the corresponding permutation representation - tested and verified.
- TO NOTE: I think the convention is for (1)-(5), it is Q^T P^T U' P Q - refer to csd_solution_set_write_circuit() documentation
- Fixed small bug in qperm_reverse() subroutine - j index was meant to go up to N+1 instead of N
- WAIT WHAT WEIRD BUG WITH OUTPUTING FILE NAME - PLEASE FIX OR CONFIRM - CONFIRMED TO EXIST ONLY IN SUBLIME TEXT DISPLAY???
- Finished writing code that replaces neighbourhood operator with a new one that builds up P using conditional NOT gates - to be played with.
- Found major bug in edited csd_generator_reducesolution() subroutine introduced by replacing len=20 character arrays with the dynamic binstr object - causes floating lines, incorrect conditionals and other bad bad things.
- Fixed major bug by spending ages figuring out old code - issue was with indexing in string vs the circuit convention.
- Verified results for the S8 graph.
- NOTE (COURTESY OF ERNEST): PUT PONIES IN AT SOME POINT.
- Tested a variant of the generategate() subroutine which limits NOT gates to 1 per gate - performance seems to be slightly worse, but merits further investigation.
- Results unanimously agree that the results generated in this fashion is better than the standard neighbourhood operator used in v0.*, but...
- ...plots of history indicate a Nike (just do it) trend - the history plot looks like a tick, i.e. drop to the optimal solution, followed by a linear increase (rate presumably being the threshold maximum). Probably because the new neighbourhood operator allows for slower increases? Either way, it is a problem.
- Decided to store current progress as v1.02 - next up, looking into either (a) going back to a standard simulated annealing algorithm, or (b) modifying the threshold based acceptance algorithm to force decay.

v1.11:
- Testing reveals a greater issue with the simulated annealing process - a pure random walk (i.e. threshold = 0) gives much better results. Checking to see if this is caused by the current threshold algorithm combined with the new neighbourhood operator, or if it is also an issue with the old neighbourhood operator.
- Either way, probs have to keep in mind to (1) be suspicious when the optimal result is close to the beginning, and (2) to always check the annealing results against a pure random walk first.
- Okay yeah gg lol - also an issue with the old neighbourhood operator (verified for the 3CT3 operator) - i.e. the simulated annealing process actually hurts more than helping.
- Note: No code changes up to this point.
- Noticed a performance issue with using the Fortran intrinsic random_number() for random numbers - from the documentation: 'However, the KISS generator does not create random numbers in parallel from multiple sources, but in sequence from a single source. If an OpenMP-enabled application heavily relies on random numbers, one should consider employing a dedicated parallel random number generator instead.' Hence, probably should switch it to an independent RNG for each thread.
- Created a new source file OptQC_RNG.f90 which defines the rng module. Source code derived from 'http://jblevins.org/log/openmp'. 
- Tested new module - adapted the seeding to generate fairly different sequences on different MPI threads. Integrated it with existing code, and edited Makefile correspondingly.
- Results confirm that everything is fine. HUZZAH. Although noticed a slight dip in performance (maybe)? Probably negligible. 
- Decided to store current progress as v1.11 - next up, fixing all the things.

v1.2F (Final):
- Copy-pastaed source code from http://www.fortran-2000.com/rank/ (written by Michel Olagnon) for efficient code for array ranking. Updated Makefile correspondingly.
- Wrote and tested mechanism for dividing processes up into groups (hardcoded: no of groups = 0.1 * no of processes) - rank 0 process is taken to be the root process.
- To circumvent lack of flexibility with pointers, decided to make ANOTHER object to handle the MPI broadcasting of solutions within a group.
- Added new module OptQC_MPI.f90 to handle MPI broadcasting stuffs. Edited Makefile correspondingly.
- Bleh so much effort to write the entire module. Spent ages debugging (reminder: if the mpi malloc/free fails at MPI_Finalize, probably an indicator of out-of-bounds array).
- Also: Anomaly of 300 -> 500 secs computation time for 3CT3? Communication? Doesn't look like multiplication. MAYBE ONCE IN A BLUE MOON????
- Gave up on using matrix signatures - just computed matrix difference from original, and everything looks 0. So should be legit all good.
- Extended CPLX implementation with synchronization - tested and BAM IT WORKS.
- All results (generated using a pure random walk) are better than before woots.
- Decided to store current progress as v1.2F - should not require any more changes, it is as optimal as can be given the current framework. Next up, genetic algorithm woots.

v1.3FFR (Final-for-realz)
- Note: Genetic algorithm stuff flopped. Boooooo.
- Verified RandReal decomposition results.
- It appears that any deviation from the pure random walk is pointless - only yielding worse results. Time to get rid of threshold annealing stuff. RAWR.
- Gotten rid of threshold annealing stuff - ran simulations that confirm results.
- Updated documentation.
- Decided to store current progress as v1.3FFR - for REALZ this time. Unless I decide to sort out the inefficiency in recovering U in OptQC_MPI.f90. Which I probably won't.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Version History for OptQC

v0.1:
- Rewrote Makefile using automatic variables and pattern matching for conciseness.
- Recoding of MPI_COMPOSITE/v1.1 for more streamlined code (possibly at small expense of performance).
- Removed CPLX implementation for the moment, focus on overhauling REAL implementation first.
- Removed calculation of modified matrix via total Hamming distance minimization - it really wasn't helping (when permutation cost is included).
- Made new file OptQC_CSD - with the intention of coding the CSD process as an object.
- Absorbed functionality from CYGR_CSD.f90, CYGR_BLKCSD.f90, CYGR_CUTGATE.f90, OptQC_Output.f90 and some of CYGR_WRITEF.f90, OptQC_Common.f90 and OptQC_GateCount.f90 into OptQC_CSD.f90.
- Decided to store current progress as v0.1 - especially makefile changes. Object-oriented approach to be expanded upon and to be used in the coming versions.
- Reminder: Restore documentation for CSD functions? Meh, maybe not, or maybe later.

v0.2:
- Moved code for CYG_INDEXTABLE.f90 into OptQC_Common.f90, since well, that should have been done.
- Removed writing of "*_gates.txt" - difficult in general, and not particularly informative.
- Removed writing of the non-reduced solution in favour of just the reduced solution.
- Renamed output .tex file to "*_circuit.tex".
- Deleted files: CYG_INDEXTABLE.f90, CYGR_CSD.f90, CYGR_BLKCSD.f90, CYGR_CUTGATE.f90, OptQC_Output.f90, CYGR_WRITEF.f90, OptQC_GateCount.f90 and OptQC_Mem.f90 due to absorption or removal of functionality.
- Edited makefile correspondingly.
- Moved intermediate GATEY and GATEPI variables from csd_solution object to the csd_generator object.
- Fixed writing function for csd_solution - double precision changed to integer. Minor edits for clarity in said function.
- Defined the csd_solution_set object as a collection of csd_solution objects.
- Minor design changes to csd_solution and csd_solution_set objects.
- Further editing of OptQC_Main.f90 file.
- Edited OptQC_Perm.f90 to use objects in OptQC_CSD.f90.
- UGH DEBUGGING.
- Modified margins of .tex output.
- Decided to store current progress as v0.2 - planning to make another object to handle file output. WHAI >_>

v0.3:
- MOAR OBJECTS RAWR. Specifically for file output, called csd_write_handle.
- Added usage of csd_write_handle to csd_solution and csd_solution_set objects.
- Added usage of csd_write_handle to OptQC_Main.f90.
- UGH MOAR DEBUGGING.
- FIX GRAPHICAL ISSUES - DONE AND DONE!!!
- Checked convention for permutations - it is the assumed U = P^T U' P. HUZZAH!
- Compared with results from old code (MPI_COMPOSTITE/Test/) - initial decomposition results are different (with this one being more optimal). What the magical pony - clean code OP T_T
- Verified results by multiplying out all the gates in a circuit using debugging code.
- Modified output precision for gate parameter values.
- Note: The convention U = P^T U' P means that 1st matrix in array is the last to be applied. Hence, order of output in csd_solution_set_write_circuit has been reversed.
- Fixed unusage of sgn parameter by introducing the neg logical element in the csd_solution, csd_solution_set, csd_generator and csd_write_handle objects.
- TODO: Paradoxial relationship between visual prettiness and accuracy - currently about 10^-5 error for 4 decimal places. Leaving it for now.
- Decided to store current progress as v0.3 - need to add the qubit permutation nonsense in the next version.

v0.4:
- UGH time to add qubit permutation stuff.
- Added subroutines qperm_compute(), qperm_generate() and qperm_reverse() for qubit permutations in OptQC_Perm.f90.
- Modified OptQC_Main.f90 to use the qubit permutation stuffs.
- Checked that qperm_generate() is working correctly - fixed the root process to have the identity qubit permutation.
- Modified init_random_seed() to be process dependent.
- TODO: Possibly relook at usage of system_clock() in init_random_seed() in an MPI environment? Also maybe change usage of random_number() in RINT() to an MPI function instead?
- Debugged qperm_compute() using ALL THE MAGIC.
- Updated OptQC_Main.f90 with usage of qperm stuffs.
- Done and done - verified results with the SWAP gates added in! IT WORKS! HUZZAH!
- Decided to store current progress as v0.4 - planning to add mechanism to keep changing initial permutation until number of gates is lower, and also to muck around with simulated annealing parameters for optimal results.

v0.5:
- LET'S DO IT!
- Edited OptQC_Main.f90 to cycle non-root processes until the initial number of reduced gates is at least as low as that of the root process (i.e. the identity qubit permutation).
- Checked init_random_seed() and RINT() - conclusion being it should yield different results for different processes.
- Checked that rerolling permutations isn't the cause of the exceptionally long program runtime - perhaps the file writing?
- Adding object prog_args to encapsulate all arguments to the program - makes passing it less convoluted.
- Added two extra parameters: PERM_ITER_LIM abd TOL_COEFF.
- Edited makefile correspondingly.
- Added flags -warn all -nogen-interfaces for debugging purposes, and -xHost -ipo for optimization.
- Spent 3 days debugging a weird error - only to realize that qperm_process() was being called with one less argument than it should have. WHAI COMPILER SO DUMB (EVEN WITH WARNING MESSAGES ENABLED) >_>
- Everything looks fixed now - phew. Still testing results for different parameters.
- TODO: Test using non-reduced number of gates as objective function instead?
- TODO: Remove output of gateseq.txt plz.
- Decided to store current progress as v0.5 - planning to WRITE ALL THE CPLX CODE in the next version.

v0.51:
- RAWR TIME TO WRITE ALL THE CPLX CODE.
- Modified the csd_write_handle_assign_target() procedure to be somewhat more generic.
- Moved the csd_write_handle object to OptQC_CommonModule.f90 instead.
- Copied OptQC_CPLX_WKVar.f90 into the folder.
- Created OptQC_CPLX_CSD.f90 and did massive copying and SHO MUCH EDITING, based of code from CPLX implementation.
- Edited makefile correspondingly.
- Seems to now compile correctly for OptQC_CPLX_CSD.f90. WOOTS!
- Design issues OP - permutation matrices being interpreted as complex matrices (and hence uses the complex decomposition) - terribly inefficient for permutation matrices, which are completely real.
- Ended up with a gajillion segfaults during deallocation - also design issues are sads.
- Decided to store current progress as v0.51 - overhauling objects using a combined real/cplx implementation.

v0.52:
- LET's DO IT! ONCE MORE! OVERHAUL TIME!
- Combined the real and cplx pairs for OptQC_CSD.f90, OptQC_Perm. SHO MUCH WORK!
- Corresponding edits in the main files.
- FOUND BUG STUPID CIRCUIT ALLOCATION SIZE FOR CPLX >_>
- Identified bad mistake in looping through qubit permutations - changed < to <=.
- Identified bug in csd_generator_run_csdphase() procedure - dimensions of Z should be (M,M+N), not (M,M) which is obtained from the pointer.
- Fixed bug by introducing a new variable Z_temp specific to the cplx implementation.
- HUGE BUG HUNT:
-> Two issues: (1) this and MPI_COMPOSITE (see ../Comp folder) results do not match with Qcompiler results running on Fornax; and (2) Mathematica doesn't seem to match the results....
-> Issue (1) EVENTUALLY found to be because of dodgy reading procedure in Qcompiler where precision fails after a certain point - 'fixed' by adding -i8 -r8 compile options.
-> Decided on removing -i8 -r8 compile options from the Makefile of this program - unnecessary (results are still the same) and removes possible compile issues.
-> Issue (2) - NOT RESOLVED - either because Mathematica formulation is incorrect somehow - or because Qcompiler was bugged to begin with.
-> Good enough for now - I CAN FINALLY EAT!
-> Checked by entering a real matrix as complex, compared with Qcompiler implementation - results are identical! So the problem being complex matrices.... >_>
-> From real results, Mathematica indicates that GATEPHASE definition is {{1,0},{0,exp(i*param*pi)}}.
- Decided to store current progress as v0.52 - the bug hunt continues...

v0.6:
- MOAR BUG HUNTING:
-> Decomposed 2-by-2 matrix - results agree with Qcompiler as usual, but results are still fail.
-> Checking DORCSD/ZUNCSD subroutines - results are correct for both real and complex matrices - note that V1T and V2T matrices have already been conjugate transposed.
-> 'O' option for DORCSD/ZUNCSD gives the convention for the middle matrix as {{C,S},{-S,C}}
-> SDAJFHDFHKJFHAKSJFSAKHFASSKJFBASJFN problem in my testing script - after fixing that, YANGANG'S CODE IS VERIFIED TO BE COMPLETELY FINE.
-> I guess problem solved >_>
- Or not....getting segfaults for both REAL and CPLX implementation. WHAIIIIIII.
- Segfaulting for > 1 process? Output issues? Symptomatic? Possible issue with qubit permutation sec?
- Decided to store current progress as v0.6 - contemplating rerolling to earlier version.

v0.6R (Rerolled):
- Rolled back to v0.51 - last working version for the REAL implementation. 
- Slowly re-copied all changes from v0.52 and v0.6 back.
- And with (as far as I can tell) exactly the same code as v0.6, it somehow works, but v0.6 still dies with segfaults.
- Verified result correctness using Mathematica - ALL GOOD WOOTS.
- Removed debug code outputting gateseq.txt file.
- Removed extra output precision - was meant to help with debugging previously. 
- Decided to store current progress as v0.6R - still need to sort out problem with undefined get_command_argument() behaviour. Or should I bother?

v0.61:
- Derived from v0.6R.
- Small bugfix: Circuit output should no longer show an extra seperator under any circumstance - fixed by precounting seperators required.
- Readded documentation files FILE_LIST.txt and README.txt.
- Re-removed -i8 -r8 compile options.
- Realized that *_perm.dat files are somewhat obsolete, since they don't include information about the qubit permutation Q. Contemplating the need to output U' at all?
- Changed *_perm.dat output file - it now gives QPerm and Perm_sol lists only.
- This should be the final release version, unless there's more bugfixing to be done.
- WHAT DO YOU MEAN THERE'S A MEMORY LEAK T_T
- TODO: Implement recording of qubit selection phase.
- TODO: Also realized that the qubit selection phase terminates as soon as a smaller value has been found - fix maybe?
- Traced bug to removal of -i8 option while still passing MPI_INTEGER8 around in MPI routines - default integer kind is 4, not 8.
- Also massive Fortran lousiness in its MPI bindings - always expects arrays, since no way of obtaining the address of a scalar variable.
- TODO: Fix above nonsense.
- Okay so this isn't the final version - next one I promise! Decided to store current progress as v0.61 - sort out the above issues plz.

v0.62F (Final):
- Introduced MPI_int_buffer for sending an integer scalar - fixed up all the MPI communication subroutine calls.
- Settled on without the -i8 flag - so all the MPI communication uses the default MPI_INTEGER (which has KIND=4). I really don't need 64-bit integers.
- Did I mention how much Fortran is stoooopid?
- Implemented recording of qubit selection phase with no premarture termination.
- Implemented careful selection (using the ChooseN function) of N after finding cases where the original code would accidentally bloat a matrix with size exactly matching a power of 2.
- Okay this SHOULD be the final release version - pending approval by results.
- Results approved - THIS IS THE FINA RELEASE VERSION HUZZAH!