Skip to content

This is a codebase for computing the medoid of a dataset accompanying the paper "Ultra Fast Medoid Identification via Correlated Sequential Halving" https://arxiv.org/abs/1906.04356

Notifications You must be signed in to change notification settings

TavorB/Correlated-Sequential-Halving

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Correlated-Sequential-Halving

Finds the medoid of n points efficiently, approximately O(n log^2 n): brute force takes O(n^2) time.

This is a codebase to reproduce all the figures and numerical results in the paper titled - "Ultra Fast Medoid Identification via Correlated Sequential Halving". Please reach out to me at tavorb "at" stanford.edu if you have any questions on how to run the code or replicate results.

  1. All the figures can be viewed and generated via ipython notebooks in 'figure' folder
  2. The above figures are generated from experiments, which can be re-generated using the following code
  • python algorithm_rand.py --dataset *** --num_exp 1000 --num_jobs 32 --verbose False
  • python algorithm_brute.py --dataset *** --num_exp 1 --num_jobs 1 --verbose False
  • python algorithm_meddit.py --dataset *** --num_exp 1000 --num_jobs 32 --verbose False
  • python algorithm_correlated.py --dataset *** --num_exp 1000 --num_jobs 32 --verbose False
    • budget can be modified by changing valRange in algorithm_correlated.py. Can perform 'doubling trick' to search and find a good budget.

With the following options:

  • dataset - name of the dataset (rnaseq20k, netflix20k, netflix100k, mnist)
  • num_exp - Number of total experiments
  • num_jobs - Number of experiments run parallely
  1. Dependencies: tables and h5py are used in loading rnaseq data, can be installed with python -m pip install h5py. Code is compatible with both Python 2 and 3.

About

This is a codebase for computing the medoid of a dataset accompanying the paper "Ultra Fast Medoid Identification via Correlated Sequential Halving" https://arxiv.org/abs/1906.04356

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published