- Test for group + multiple fields
- Intersect with single posting list
- Test for erase dropping elements below compressed list threshold
- Test for array token positions
Search index
Fix memory ratio (decreasing with indexing)Speed up wildcard searches furtherAllow int64 in default sorting fieldUse connection timeout for CURL rather than request timeoutAsync importHighlight all matching fieldsProper JSON as inputStoring raw JSON input to RocksDBART for every indexed fieldDelete should remove from RocksDBSpeed up UUID generationMake the search score computation customizableart int search should support signed intsSearch across multiple fieldsHave set inside topster itselfPersist next_seq_idcollection_id should be int, not stringAPI should return countFix documents.jsonl path in testsMulti field search testsstorage key prefix should include collection nameIndex and search on multi-valued fieldrange search for art_intRestore records as well on restart (like for meta)drop collection should remove all records from the storeMulti-key binary search during scoringAssumption that all tokens match for scoring is no longer trueFiltersFacetsSchema validation during insertion (missing fields + type errors)Proper score field for ranking tokensThrow errors when schema is brokenDesc/Asc ordering with testsFound count is wrongFilter query in the APIFacet limit (hardcode to top 10)Deprecate old split functionMultiple facets not workingSearch snippet with highlightSnippet should only be around surrounding matching tokensProper paginationPagination parameterDrop collection APIJSONP response"error":"Not found." is sent when query has no hitsFix API response codesList all collectionsFetch an individual documentID field should be a string: must validateNumber of records in collectionTest for asc/desc upper/lower casingTest for search without any sort_by givenTest for collection creation validationTest for delete documentart float searchWhen prefix=true, use default_sorting_field for token ordering only for last wordonly last token should be prefix searchedPrefix-search strings should not be null terminatedsort results by float fieldjson::parse must be wrapped in try catchCollection Manager collections map should store plain collection nameinit_collection of Collection manager should probably take seq_id as paramnode score should be int32, no longer uint16 like in document structTypo in prefix searchWhen field of "id" but not string, what happens?test for num_documentstest for string filter comparison: title < "foo"Test for sorted_array::indexOf when length is 0Test for paginationsearch_fields, sort_fields and facet fields should be combinedfacet fields should be indexed verbatimchange "search_by" to "query_by"during index_in_memory() validations should be front loadedSupport default sorting field being a floathttps supportValidate before string to int conversion in the http api layerart bool supportExport collectionget collection should show schemaAPI key should be allowed as a GET parameter also (for JSONP)Don't crash when the data directory is not foundWhen the first sequence ID is not zero, bail outProper status code when sequence number to fetch is badReplica should be read-onlystring_utils::tokenize should not have max lengthhandle hyphens (replace them)clean special chars before indexingAdd docs/explanation around ranking calcUTF-8 normalizationUse rocksdb batch put for atomic insertionProper loggingHandle store-get() not finding a keyDeprecate converting integer to string verbatimDeprecate union type punningReplica server should fail when pointed to "old" mastergzip compress responsesHave a LOG(ERROR) levelHandle SIGTERM which is sent when process is killedUse snappy compression for storageFix exclude_scalar early returnsFix result ids length during grouped overridesFix override grouping (collate_included_ids)Test for overriding result on second page- atleast 1 token match for proceeding with drop tokens
- support wildcard query with filters
- API for optimizing on disk storage
- Jemalloc
- Exact search
- NOT operator support
- Log operations
- Parameterize replica's MAX_UPDATES_TO_SEND
- NOT operator support
- 64K token limit
-
INT32_MAX validation for float field
- highlight of string arrays?
- test for token ranking on float field
- test for float int field deletion during doc deletion
- Test for snippets
- Test for replication
- Query token ids should match query token ordering
- ID should not have "/"
- Group results by field
- Delete using range: https://github.com/facebook/rocksdb/wiki/Delete-A-Range-Of-Keys
- Test for string utils
- Prevent string copy during indexing
- Minimum results should be a variable instead of blindly going with max_results
- Handle searching for non-existing fields gracefully
- test for same match score but different primary, secondary attr
- Support nested fields via "."
- Support search operators like +, - etc.
- Space sensitivity
- Use bitmap index instead of compressed array for doc list?
- Primary_rank_scores and secondary_rank_scores hashmaps should be combined?
- d-ary heap?
topster: reject min heap value compare only when field is samematch index instead of match score
API
- Support the following operations:
create a new indexindex a single documentdelete a document by IDquery an indexDrop an indexfetch a document by ID
Clustering
- Sync every incoming write with another Typesense server
Refactoring
token_count
in leaf is redundant: can be accessed from valuestoring length inoffsets
is redundant: it can be found by looking up value of the next index in offset_index
Tech debt
Use GLOB file pattern for CMake (better IDE refactoring support)- DRY index_int64_field* methods