Skip to content

Commit

Permalink
Set TP argument correctly when instantiating PagedKVCacheManager (IBM#94
Browse files Browse the repository at this point in the history
)

#### Motivation

Users are seeing runtime errors when trying to use TP>1 with speculative
decoding.

#### Modifications

We need to set the tensor parallel argument correctly when we
instantiate the PagedKVCacheManager.

#### Result

I have verified that this change resolves the reported issue. 

#### Related Issues

https://huggingface.co/ibm-fms/llama3-8b-accelerator/discussions/1

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
  • Loading branch information
tdoublep authored May 10, 2024
1 parent e87d462 commit ddc56ee
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion server/text_generation_server/models/paged_causal_lm.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ def __init__(
model_config.num_attention_heads,
model_config.hidden_size,
kv_heads=model_config.num_key_value_heads,
tensor_parallel_size=1,
tensor_parallel_size=self.engine.world_size,
dtype=dtype,
device=self.device,
total_num_gpu_blocks=total_num_gpu_blocks,
Expand Down

0 comments on commit ddc56ee

Please sign in to comment.