Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the results #11

Open
RJzz opened this issue Aug 10, 2018 · 7 comments
Open

Questions about the results #11

RJzz opened this issue Aug 10, 2018 · 7 comments

Comments

@RJzz
Copy link

RJzz commented Aug 10, 2018

Hello, I try to evaluate the uploaded trained restaurant model by running evaluation.py directly, did not make any changes to the code, but did not get the same results as the paper, is it necessary to tune some parameters? Thank you for your answer.
That's the result I got.
--- Results on restaurant domain ---
precision recall f1-score support

     Food      0.654     0.493     0.562       887
    Staff      0.406     0.270     0.324       352
 Ambience      0.245     0.143     0.181       251
Anecdotes      0.000     0.000     0.000         0
    Price      0.000     0.000     0.000         0

Miscellaneous 0.000 0.000 0.000 0

avg / total 0.527 0.381 0.442 1490

@ruidan
Copy link
Owner

ruidan commented Aug 10, 2018

All the parameters are already set correctly as default and there is no need to tune. If you follow the evaluation instruction in README, the results should be exactly as follows (I just checked again by downloading the current code):

screen shot 2018-08-10 at 11 40 41 pm

Did you uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model ?

@RJzz
Copy link
Author

RJzz commented Aug 11, 2018

Yes,i uncomment line 28 in evaluation.py to set the model to be evaluated as the pre-trained model,and i just change the import lib in model.py as follow:
import logging
import os
os.environ['KERAS_BACKEND']='theano'
import keras.backend as K
K.set_image_dim_ordering('th')
import importlib
importlib.reload(K)
from keras.layers import Dense, Activation, Embedding, Input
from keras.models import Model
from my_layers import Attention, Average, WeightedSum, WeightedAspectEmb, MaxMargin

Shouldn't I be doing this?This is the result of my downloading and running again.
precision recall f1-score support

     Food      0.855     0.729     0.787       887
    Staff      0.792     0.636     0.706       352
 Ambience      0.781     0.470     0.587       251
Anecdotes      0.000     0.000     0.000         0
    Price      0.000     0.000     0.000         0

Miscellaneous 0.000 0.000 0.000 0

avg / total 0.827 0.664 0.734 1490

thanks a lot!

@ruidan
Copy link
Owner

ruidan commented Aug 11, 2018

  1. Seems you are using Python 3. The code is tested under Python 2.7. Not sure whether this will affect the results. Also check the versions of other dependencies.

  2. Did you download the preprocessed datasets directly or you preprocessed the original datasets again using the script provided? You should use the preprocessed dataset for the saved model. If you preprocess the dataset again, the word indexes may not exactly match the saved word embeddings as I don't really remember whether I changed something in the preprocess script when cleaning up the code.

@RJzz
Copy link
Author

RJzz commented Aug 11, 2018

thanks!i try to use Python 2.7, and get a similar result

@ilivans
Copy link

ilivans commented Nov 3, 2018

Hi @ruidan,
Great job!
I wonder why are the results above different from the reported in the article?
image
Is it random seed or something else? Can we reproduce the same results as presented in the article?

@ThiagoSousa
Copy link

ThiagoSousa commented Nov 6, 2018

I also tried to replicate the results and I got the same as @ruidan. It is a close result to the paper, but not exactly. Was it a different training set or parameters used in the paper?

screen shot 2018-11-06 at 2 07 18 pm

EDIT: I checked the default parameters in the code and they are pretty much the same as the papers. The paper mentions that the reported results are an average of 10 executions; therefore @ilivans, it might explain the different results.

EDIT2: disregard my other questions, I found the answer in the paper.

@ilivans
Copy link

ilivans commented Nov 6, 2018

@ThiagoSousa thank you! Shame on me for not noticing that detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants