This repo contains the code, data, and models for "TIMO: Towards Better Temporal Reasoning for Language Models”.
Our models are all available at Huggingface:
We demonstrate the main results of TIMO as follows:
Model | Math-time Avg | Pure-time Avg | Average |
---|---|---|---|
Timo | 64.4 | 78.07 | 72.7 |
MAmmoTH | 57.08 | 62.71 | 60.0 |
WizardMath | 58.8 | 61.26 | 59.9 |
CodeLlama | 54.55 | 64.10 | 59.8 |
LLaMA2 | 57.65 | 66.30 | 62.7 |
WizardCoder | 53.05 | 59.83 | 57.8 |
ToRA | 51.03 | 65.71 | 58.2 |
TimeLLaMA | 48.3 | 29.0 | 38.6 |
Model | Math-time Avg | Pure-time Avg | Average |
---|---|---|---|
Timo | 72.83 | 82.97 | 78.3 |
MAmmoTH | 70.68 | 69.52 | 72.1 |
LLaMA2 | 66.18 | 70.42 | 70.7 |
WizardMath | 63.65 | 70.62 | 68.4 |
WizardCoder | 61.6 | 66.08 | 65.9 |
CodeLlama | 63.55 | 67.05 | 65.7 |
ToRA | 57.85 | 68.90 | 65.6 |
We introduce TIMO 🌱, a series of open-source large language models (LLMs) designed for temporal reasoning. Through our insightful discoveries of the close relationship between mathematics and temporal reasoning, we introduce a self-critic temporal optimization method to equip the model with comprehensive temporal reasoning capabilities. TIMO models are trained on preference pairs generated by the model itself and critics on temporal tasks, encompassing 19 pure-time temporal tasks. TIMO demonstrates significant generalizability across all temporal tasks without sacrificing its general task abilities, establishing itself as the new state-of-the-art model for comparable sizes.
Clone this repository and install the required packages:
git clone https://github.com/zhaochen0110/Timo.git
cd Timo
pip install -r requirements.txt
To play with our model, run:
from transformers import pipeline
pipeline = pipeline("text-generation", "Warrieryes/timo-7b-hf")
template = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{query}\n\n### Response:'''
query = "What is 08:32 AM - 04:28?\n (A) 6:10 AM\n (B) 2:49 AM\n (C) 6:17 AM\n (D) 4:04 AM"
input = template.format(query=query)
output = pipeline(input)[0]['generated_text']
print(output)
To replicate the experimental results in our paper, run:
python inference.py \
--model_path $model_path \
--data_path $data_path \
--excel_folder $excel_folder \
--output_path $output_path
We use the MAmmoTH project's code to train mathematical models. Then we use the following code to generate Temporal Preference pairs:
python generate.py \
--model_path $model_path \
--generate True \
--train_data_path $train_data_path \
--score True \
--save_path $save_path
After generating preference pairs, we use Direct Preference Optimization (DPO) to train the model:
python tdpo.py \
--model_name_or_path $model_name_or_path \
--json_path $json_path \
--output_dir $output_dir
This project is licensed under the Apache 2.0 license - see the LICENSE file for details.
This project is partly based on the work done in MAmmoTH. Special thanks to their authors for valuable contributions.
Please cite our paper if you use our data, model or code. Please also kindly cite the original dataset papers.
@article{su2024timo,
title={Timo: Towards Better Temporal Reasoning for Language Models},
author={Su, Zhaochen and Zhang, Jun and Zhu, Tong and Qu, Xiaoye and Li, Juntao and Zhang, Min and Cheng, Yu},
journal={arXiv preprint arXiv:2406.14192},
year={2024}
}