Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

anandhu-gopi · 2023-05-17T17:51:27Z

Issue Description

Current Behavior

The all_pks method in the redis-om library lacks the option to specify the count parameter for the SCAN command. As a result, it uses a default count value, causing performance issues when dealing with a large number of keys in Redis.

https://github.com/redis/redis-om-python/blob/2b450b5feec5602ac823d14cbb696ec73cf02347/aredis_om/model/model.py#LL1359C1-L1368C1

    #! Current Behavior
    @classmethod
    async def all_pks(cls):  # type: ignore
        key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
        # TODO: We need to decide how we want to handle the lack of
        #  decode_responses=True...
        return (
            remove_prefix(key, key_prefix)
            if isinstance(key, str)
            else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
            async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH")
        )

Desired Behavior

I would like to propose adding an option to pass the count parameter in the all_pks method. This would allow users to customize the number of keys processed at a time, providing better control over performance when working with Redis.

    #! Expected Behavior
    @classmethod
    # we can make count as an optional parameter 
    # to make the change non-breaking
    async def all_pks(cls, count: Optional[int]= None):  # type: ignore
        key_prefix = cls.make_key(cls._meta.primary_key_pattern.format(pk=""))
        # TODO: We need to decide how we want to handle the lack of
        #  decode_responses=True...
        return (
            remove_prefix(key, key_prefix)
            if isinstance(key, str)
            else remove_prefix(key.decode(cls.Meta.encoding), key_prefix)
            async for key in cls.db().scan_iter(f"{key_prefix}*", _type="HASH", count= count)
        )

Steps to Reproduce (if applicable)

N/A

Impact

Adding this feature would improve the performance and scalability of the library when working with Redis instances containing a large number of keys. It would provide users with better control over the data retrieval process and enable them to fine-tune the performance according to their needs.

Additional context:

I have gathered some examples that highlight the performance gains achieved by modifying the value of the count parameter. These examples serve as evidence for the potential improvements in performance when users can customize the count value in the all_pks method. (I have used pyinstrument to profile code)

Note:

It is important to note that determining the ideal count value depends on the specific Redis instance and the number of items it contains. The count value I have used in my examples is based on the number of items in my Redis instance and may not be universally applicable

Test Code

from typing import Optional
from aredis_om import HashModel, get_redis_connection

class DummyModel(HashModel):
    
    class Meta:
        global_key_prefix = "logomatcher_orchestrator"
        model_key_prefix = "SimilarityScoreModel"
        database = get_redis_connection(
            host="redishost",
            port=6379,
            db=0,
        )


async def print_keys_with_prefix_matching(prefix:str,count:Optional[int] = None):
    async for key in DummyModel.db().scan_iter(f"{prefix}*", _type="HASH",count=count):
        print(key)

time took with the default count value

>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test") )

>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d7002a60>
>>>
>>> profiler.print()

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:40:33  Samples:  57622
 /_//_/// /_\ / //_// / //_'/ //     Duration: 97.692    CPU time: 48.077
/   _/                      v4.4.0

Program:

97.692 MainThread  <thread>:140504873899840
├─ 49.604 Redis.scan_iter  aioredis/client.py:2413
│     [114 frames hidden]  aioredis, asyncio, <built-in>, async_...
├─ 43.834 <module>  <stdin>:1
│     [330 frames hidden]  <stdin>, asyncio, aioredis, async_tim...
├─ 1.468 HiredisParser.can_read  aioredis/connection.py:477
│     [8 frames hidden]  aioredis, async_timeout
└─ 1.124 StreamReader.read  asyncio/streams.py:643
      [4 frames hidden]  asyncio

time took with the count value 1000

>>> import asyncio
>>> from pyinstrument import Profiler
>>>
>>> profiler = Profiler()
>>> profiler.start()
>>>
>>> asyncio.get_event_loop().run_until_complete( print_keys_with_prefix_matching("test",count=1000) )
>>> profiler.stop()
<pyinstrument.session.Session object at 0x7fc9d6fa4cd0>
>>>
>>> profiler.print()

  _     ._   __/__   _ _  _  _ _/_   Recorded: 17:46:30  Samples:  580
 /_//_/// /_\ / //_// / //_'/ //     Duration: 0.782     CPU time: 0.471
/   _/                      v4.4.0

Program:

0.781 MainThread  <thread>:140504873899840
├─ 0.644 <module>  <stdin>:1
│     [130 frames hidden]  <stdin>, asyncio, selectors, <built-i...
└─ 0.136 Redis.scan_iter  aioredis/client.py:2413
      [78 frames hidden]  aioredis, asyncio, <built-in>, async_...

Environment

Library version: 0.0.27
Python version: 3.9.16

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

anandhu-gopi commented May 17, 2023

Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

Enhancement Request: Add Option to Specify Count Parameter in all_pks Method #512

Comments

anandhu-gopi commented May 17, 2023

Issue Description

Current Behavior

Desired Behavior

Steps to Reproduce (if applicable)

Impact

Additional context:

Note:

Test Code

time took with the default count value

time took with the count value 1000

Environment