Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for serverless, reacting to api changes, adding tests #108

Merged
merged 15 commits into from
May 18, 2024

Conversation

maumar
Copy link
Contributor

@maumar maumar commented May 5, 2024

Fixes #91
Fixes #107

Lot of breaking changes here - there are significant differences between old and new APIs, also pod vs serverless Tests run against both pod and serverless using the free plan. Functionality not available in this plan has not been tested.

NOT TESTED (requires paid plan):

  • create collection from index,
  • restore index from collection,
  • list collections (when there is something in them),
  • pod-based index modifications (upscaling, downscaling)
  • delete vectors based on filter

- Moving to new global API
- Adding tests
- Minor API fixes to more accurately reflect the API spec

Fixes neon-sunset#91
Fixes neon-sunset#107

Lot of breaking changes here - there are significant differences between old and new APIs, also pod vs serverless
Tests are mainly using serverless, since at the time of writing free pod based offering is broken (indexes won't initialize)

NOT TESTED / OUTSTANDING ISSUES:
collection operations - serverless doesn't support collections, pod based gcp-starter is currently broken (does it support collections)
- create collection from index
- restore index from collection
- list collections (when there is something in them)

metadata filtering on non-query operations
- upsert, set metadata
- delete vectors based on filter

complex metadata operations (see https://docs.pinecone.io/guides/data/filtering-with-metadata#metadata-query-language)

sparse vectors

use legacy CreateIndex - environment is no longer part of PineconeClient definition, for now storing the value if legacy ctor was called. Also, where to get pods and pod_type?
@maumar
Copy link
Contributor Author

maumar commented May 5, 2024

still work in progress - I focused on serverless and that works pretty well, but I have no way currently to test if/how much is broken for the pod based indexes. Since the latest update gcp-starter has been broken.

@neon-sunset
Copy link
Owner

Thank you for the contribution!

there are significant differences between old and new APIs, also pod vs serverless Tests are mainly using serverless, since at the time of writing free pod based offering is broken (indexes won't initialize)

I had initial sketch of porting to the "new API", that is, the way it looked at the moment of introduction of serverless but ran into similar issues with only the old way working with pods and only the new way working with serverless, as well as billing issues. It was also possible in the past to get plain OpenAPI schema from the documentation page but they changed the way it worked and I could no longer manage to extract it or find in the published official SDKs at the time.

That and switching over from a side project tool I wrote that used to rely on Pinecone (hence no tests - the tool pretty much exercised all paths so running sample workloads gave high coverage immediately) led me to ultimately putting this project on the back burner.

Definitely happy to see interest still, if there is anything I can help with to get this done - please let me know.

@neon-sunset neon-sunset self-requested a review May 5, 2024 08:01
@maumar
Copy link
Contributor Author

maumar commented May 6, 2024

Thanks for a quick reply! According to the new blog post, new API should work across all environments now. But I will make sure once gcp starter issue is fixed on their end.
Current API is simple enough, so I looked at the docs and manually did everything - will do another thorough to make sure I didn't make copy-paste mistakes, also more tests should help (again, once starter pod works again).

Let me know if the general approach makes sense and is consistent with your design. My background is in databases and ORMs so this is all pretty new to me, but I'm interested in spending some time to improve the SDK.

spppaul

This comment was marked as off-topic.

@neon-sunset
Copy link
Owner

neon-sunset commented May 16, 2024

@maumar Thank you for working on this! Let me know if you are done so I can review this in full (i.e. look at API changes at Pinecone's side and etc.).

On tests dependent on API key - I think it's ok for them to fail for the time being, I'll look into getting a separate account for test environment.

src/PineconeClient.cs Outdated Show resolved Hide resolved
@maumar
Copy link
Contributor Author

maumar commented May 17, 2024

@neon-sunset I am close to being done. Probably good for a thorough review tomorrow, but will ping you again after I push the final changes.

src/Types/IndexTypes.cs Outdated Show resolved Hide resolved
@neon-sunset
Copy link
Owner

I added a few changes to make CI pass (if it's green, surely everything is OK, right? 😅), make F# example correct, and remove the remaining warnings (AOT-compatibility analysis for test projects is not needed, and most of them would be doing very AOT unfriendly things anyway).

All tests are for now skipped - will look into setting up a test env account as mentioned earlier.

@spppaul
Copy link

spppaul commented May 17, 2024

Do you guys have an ETA on when this will make it into nuget? I have this branch running locally and it works great.

@neon-sunset
Copy link
Owner

neon-sunset commented May 17, 2024

@spppaul this is a community-maintained library (and I'm happy this is now actually a community, with Maurycy's and Andrii's contributions!) licensed under MIT.

As with many other FOSS projects, the releases and features delivered are the result of work done by contributors in their spare time and are not subject to SLA or specific ETAs that can be committed to. This PR will likely be merged "soon(tm)" but there is no guarantee or promise to be made as this is not a commercial project bound by a contract.

@spppaul
Copy link

spppaul commented May 17, 2024

@spppaul this is a community-maintained library (and I'm happy this is now actually a community, with Maurycy's and Andrii's contributions!) licensed under MIT.

As with many other FOSS projects, the releases and features delivered are the result of work done by contributors in their spare time and are not subject to SLA or specific ETAs that can be committed to. This PR will likely be merged "soon(tm)" but there is no guarantee or promise to be made as this is not a commercial project bound by a contract.

I understand completely and thank you! Just letting you know this little gem you're working on is pretty much the only solution for not doing raw REST calls to serverless Pinecone in .NET. If I can't get it in nuget by next week I guess I'll just have to commit my local Pinecone .NET DLL to our repo and spend all day in meetings explaining it :)

@neon-sunset
Copy link
Owner

neon-sunset commented May 17, 2024

@maumar tests are failing as the secret is not accessible - GH prevents its leakage for PRs originating from users without secrets write access. It should work once merged.

When ran locally, it appears that currently index is being concurrently deleted from multiple locations. I added a dirty-ish workaround for this to just ignore the exceptions if Pinecone responded with Index not found.

I can see there are a few places where you intended to add more changes but hopefully git pull will run without you having to resolve any conflicts.

Given that it works as is (especially via gRPC), there is no outstanding feedback other than the comments above. The tests are quite a bit more complicated than I'm used to (mostly because my domain of work is low-level and there it's either ad-host/flat structure or bespoke reproduction code for complex scenarios), but I don't really have an opinion on that, if anything, it's useful to learn from those.

Make sure to add yourself to the list of authors in csproj :)

…ing test coverage for query with complex metadata
@maumar
Copy link
Contributor Author

maumar commented May 18, 2024

@neon-sunset thanks for the review and code fixes! I have just pushed in the final set of changes, so from my perspective the PR should be good to go.

Wrt test complexity - some of it stems from the fact that Pinecone doesn't have a good (any?) local testing story. My understanding is that it is an outlier - pretty much any other vector DB has a test container available: https://testcontainers.com/modules/ Hopefully Pinecone has something in the works as well, and in the future tests can be made much simpler and more robust.

maumar and others added 2 commits May 17, 2024 17:56
test: clear indexes in parallel
test: guard against delete called on an already deleted index in one more place
docs: generate XML documentation
docs: fix VS Code config for C# example
docs: make C# example publish AOT
@neon-sunset neon-sunset merged commit 9999bb3 into neon-sunset:main May 18, 2024
1 check failed
@neon-sunset
Copy link
Owner

neon-sunset commented May 18, 2024

@spppaul @maumar 2.0 version has been pushed to Nuget https://www.nuget.org/packages/Pinecone.NET/2.0.0

Let me know if something breaks.
Thanks!

@spppaul
Copy link

spppaul commented May 18, 2024

@spppaul @maumar 2.0 version has been pushed to Nuget https://www.nuget.org/packages/Pinecone.NET/2.0.0

Let me know if something breaks. Thanks!

I removed my local reference and used the 2.0 nuget package and it seemed to work the same, thanks!

@maumar maumar deleted the serverless_support branch May 18, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move PineconeClient to global API for managing index operations Serverless support
3 participants