What is the true size of dataset? #7

jay2012-lin · 2019-05-30T12:40:52Z

The paper "Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop" in KDD said you used dataset with 70,258 documents from 12,798 authors. But the dataset in this project has 203,078 documnents. It is bigger than what said in paper. However, you said in project NOTE:

"Training data in this demo are smaller than what we used in the paper, so the performance (F1-score) will be a little bit lower than reported scores."

They are contradictory. I want to know the true dataset you used in paper? Can you help me?

Best Regard!

kourenmu · 2020-02-18T13:55:25Z

I think the
“We sampled 100 author names from
a well-labeled subset of AMiner database. The benchmark consists
of 70,258 documents from 12,798 authors.”
only means for the test set.
In the test set of this project, there are 100 names but only 6399 authors.

shivashankarrs · 2020-07-02T00:15:39Z

Yeah, it is not clear which subset of the dataset (out of 600 author groups including both train and test) are manually annotated. I am not able to find the subset, with 100 names, 70,258 documents from 12,798 authors.

sanlunainiu · 2023-10-05T15:03:45Z

I think the “We sampled 100 author names from a well-labeled subset of AMiner database. The benchmark consists of 70,258 documents from 12,798 authors.” only means for the test set. In the test set of this project, there are 100 names but only 6399 authors.

In fact, in the test set "name_to_pubs_test_100.json", there are only 6,399 authors and 35,129 documents, which is half of the number reported by the authors as "The benchmark consists of 70,258 documents from 12,798 authors."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the true size of dataset? #7

What is the true size of dataset? #7

jay2012-lin commented May 30, 2019 •

edited

Loading

kourenmu commented Feb 18, 2020

shivashankarrs commented Jul 2, 2020

sanlunainiu commented Oct 5, 2023

What is the true size of dataset? #7

What is the true size of dataset? #7

Comments

jay2012-lin commented May 30, 2019 • edited Loading

kourenmu commented Feb 18, 2020

shivashankarrs commented Jul 2, 2020

sanlunainiu commented Oct 5, 2023

jay2012-lin commented May 30, 2019 •

edited

Loading