You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The paper "Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop" in KDD said you used dataset with 70,258 documents from 12,798 authors. But the dataset in this project has 203,078 documnents. It is bigger than what said in paper. However, you said in project NOTE:
"Training data in this demo are smaller than what we used in the paper, so the performance (F1-score) will be a little bit lower than reported scores."
They are contradictory. I want to know the true dataset you used in paper? Can you help me?
Best Regard!
The text was updated successfully, but these errors were encountered:
I think the
“We sampled 100 author names from
a well-labeled subset of AMiner database. The benchmark consists
of 70,258 documents from 12,798 authors.”
only means for the test set.
In the test set of this project, there are 100 names but only 6399 authors.
Yeah, it is not clear which subset of the dataset (out of 600 author groups including both train and test) are manually annotated. I am not able to find the subset, with 100 names, 70,258 documents from 12,798 authors.
I think the “We sampled 100 author names from a well-labeled subset of AMiner database. The benchmark consists of 70,258 documents from 12,798 authors.” only means for the test set. In the test set of this project, there are 100 names but only 6399 authors.
In fact, in the test set "name_to_pubs_test_100.json", there are only 6,399 authors and 35,129 documents, which is half of the number reported by the authors as "The benchmark consists of 70,258 documents from 12,798 authors."
The paper "Name Disambiguation in AMiner: Clustering, Maintenance, and Human in the Loop" in KDD said you used dataset with 70,258 documents from 12,798 authors. But the dataset in this project has 203,078 documnents. It is bigger than what said in paper. However, you said in project NOTE:
They are contradictory. I want to know the true dataset you used in paper? Can you help me?
Best Regard!
The text was updated successfully, but these errors were encountered: