Requirement

Python 3.6
Tensoflow 1.14

Dataset Preparation

First, process dataset to generate the graph and node features. The format is as follows:

feature.bin, the feature vectors of the node instances. To save the space, we store them as sparse matrix via scipy.sparse.lil.lil_matrix. Each row represents all the feature of a node.
label.bin, the one-hot labels of the node instances. Each row represents a one-hot vector of class label.
graph.bin, a dict in the format {index: [index_of_neighbor_nodes]}, where the neighbor nodes are organized as a list.

Then, we will randomly sample nodes to split the nodes into train/eval/test sets. For citation datasets cora, citeseer, pubmed, we follow the processing of GAT to split train/val/test sets. For amazon product dataset, we will use the default data separation given by OGB. For other datasets, we will take the ratio=60%, 20%, 20%. The node indexs are stored as:

idx_train.bin, the indexs of training instances.
idx_eval.bin, the indexs of evaluation instances.
idx_test.bin, the indexs of test instances.

Please refer to test_data.ipynb for the detailed information of the datasets under ./data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
src		src
README.md		README.md
main.py		main.py
stat_graph.py		stat_graph.py
test_data.ipynb		test_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirement

Dataset Preparation

About

Releases

Packages

Languages

Louise-LuLin/graphHAT

Folders and files

Latest commit

History

Repository files navigation

Requirement

Dataset Preparation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages