Search reference
API reference for GenSX Cloud search components. Search is powered by turbopuffer, and their documentation for query and upsert operations is a useful reference to augment this document.
Installation
npm install @gensx/storage
SearchProvider
Provides vector search capabilities to its child components.
Import
import { SearchProvider } from "@gensx/storage";
Example
import { SearchProvider } from "@gensx/storage";
const Workflow = gensx.Component("SearchWorkflow", ({ input }) => (
<SearchProvider>
<YourComponent input={input} />
</SearchProvider>
));
useSearch
Hook that provides access to vector search for a specific namespace.
Import
import { useSearch } from "@gensx/storage";
Signature
function useSearch(name: string): Namespace;
Parameters
Parameter | Type | Description |
---|---|---|
name | string | The namespace name to access |
Returns
Returns a namespace object with methods to interact with vector search.
Example
const namespace = await useSearch("documents");
const results = await namespace.query({
vector: queryEmbedding,
includeAttributes: true,
});
Namespace methods
The namespace object returned by useSearch
provides these methods:
write
Inserts, updates, or deletes vectors in the namespace.
async write(options: WriteParams): Promise<number>
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
upsertColumns | UpsertColumns | undefined | Column-based format for upserting documents |
upsertRows | UpsertRows | undefined | Row-based format for upserting documents |
patchColumns | PatchColumns | undefined | Column-based format for patching documents |
patchRows | PatchRows | undefined | Row-based format for patching documents |
deletes | Id[] | undefined | Array of document IDs to delete |
deleteByFilter | Filters | undefined | Filter to match documents for deletion |
distanceMetric | DistanceMetric | undefined | Distance metric for similarity calculations |
schema | Schema | undefined | Optional schema definition for attributes |
Example
// Upsert documents in column-based format
await namespace.write({
upsertColumns: {
id: ["doc-1", "doc-2"],
vector: [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
text: ["Document 1", "Document 2"],
category: ["article", "blog"]
},
distanceMetric: "cosine_distance",
schema: {
text: { type: "string" },
category: { type: "string" }
}
});
// Upsert documents in row-based format
await namespace.write({
upsertRows: [
{
id: "doc-1",
vector: [0.1, 0.2, 0.3],
text: "Document 1",
category: "article"
},
{
id: "doc-2",
vector: [0.4, 0.5, 0.6],
text: "Document 2",
category: "blog"
}
],
distanceMetric: "cosine_distance"
});
// Delete documents by ID
await namespace.write({
deletes: ["doc-1", "doc-2"]
});
// Delete documents by filter
await namespace.write({
deleteByFilter: [
"And",
[
["category", "Eq", "article"],
["createdAt", "Lt", "2023-01-01"]
]
]
});
// Patch documents (update specific fields)
await namespace.write({
patchRows: [
{
id: "doc-1",
category: "updated-category"
}
]
});
Return value
Returns the number of rows affected by the operation.
query
Searches for similar vectors based on a query vector.
async query(options: QueryOptions): Promise<QueryResults>
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
vector | number[] | Required | Query vector for similarity search |
topK | number | 10 | Number of results to return |
includeVectors | boolean | false | Whether to include vectors in results |
includeAttributes | boolean | string[] | true | Include all attributes or specified ones |
filters | Filters | undefined | Metadata filters |
rankBy | RankBy | undefined | Attribute-based ranking or text ranking |
consistency | string | undefined | Consistency level for reads |
Example
const results = await namespace.query({
vector: [0.1, 0.2, 0.3, ...], // Query vector
topK: 10, // Number of results to return
includeVectors: false, // Whether to include vectors in results
includeAttributes: true, // Include all attributes or specific ones
filters: [ // Optional metadata filters
"And",
[
["category", "Eq", "article"],
["createdAt", "Gte", "2023-01-01"]
]
],
rankBy: ["attributes.importance", "asc"], // Optional attribute-based ranking
});
Return value
Returns an array of matched documents with similarity scores:
[
{
id: "doc-1", // Document ID
score: 0.87, // Similarity score (0-1)
vector?: number[], // Original vector (if includeVectors=true)
attributes?: { // Metadata (if includeAttributes=true)
text: "Document content",
category: "article",
createdAt: "2023-07-15"
}
},
// ...more results
]
getSchema
Retrieves the current schema for the namespace.
async getSchema(): Promise<Schema>
Example
const schema = await namespace.getSchema();
console.log(schema);
// {
// text: "string",
// category: "string",
// createdAt: "string"
// }
updateSchema
Updates the schema for the namespace.
async updateSchema(options: { schema: Schema }): Promise<Schema>
Parameters
Parameter | Type | Description |
---|---|---|
schema | Schema | New schema definition |
Example
const updatedSchema = await namespace.updateSchema({
schema: {
text: "string",
category: "string",
createdAt: "string",
newField: "number", // Add new field
tags: "string[]", // Add array field
},
});
Return value
Returns the updated schema.
getMetadata
Retrieves metadata about the namespace.
async getMetadata(): Promise<NamespaceMetadata>
Example
const metadata = await namespace.getMetadata();
console.log(metadata);
// {
// vectorCount: 1250,
// dimensions: 1536,
// distanceMetric: "cosine_distance",
// created: "2023-07-15T12:34:56Z"
// }
Namespace management
Higher-level operations for managing namespaces (these are accessed directly from the search object, not via useSearch
):
import { SearchClient } from "@gensx/storage";
const search = new SearchClient();
// List all namespaces
const { namespaces } = await search.listNamespaces();
// Check if namespace exists
const exists = await search.namespaceExists("my-namespace");
// Create namespace if it doesn't exist
const { created } = await search.ensureNamespace("my-namespace");
// Delete a namespace
const { deleted } = await search.deleteNamespace("old-namespace");
// Get a namespace directly for vector operations
const namespace = search.getNamespace("products");
// Write vectors using the namespace
await namespace.write({
upsertRows: [
{
id: "product-1",
vector: [0.1, 0.3, 0.5, ...], // embedding vector
name: "Ergonomic Chair",
category: "furniture",
price: 299.99
},
{
id: "product-2",
vector: [0.2, 0.4, 0.6, ...],
name: "Standing Desk",
category: "furniture",
price: 499.99
}
],
distanceMetric: "cosine_distance",
schema: {
name: { type: "string" },
category: { type: "string" },
price: { type: "number" }
}
});
// Query vectors directly with the namespace
const searchResults = await namespace.query({
vector: [0.15, 0.35, 0.55, ...], // query vector
topK: 5,
includeAttributes: true,
filters: [
"And",
[
["category", "Eq", "furniture"],
["price", "Lt", 400]
]
]
});
The SearchClient
is a standard typescript library and can be used outside of GenSX workflows in your normal application code as well.
useSearchStorage
Hook that provides direct access to the search storage instance, which includes higher-level namespace management functions.
Import
import { useSearchStorage } from "@gensx/storage";
Signature
function useSearchStorage(): SearchStorage;
Example
const searchStorage = useSearchStorage();
The search storage object provides these management methods:
getNamespace
Get a namespace object for direct interaction.
// Get a namespace directly (without calling useSearch)
const searchStorage = useSearchStorage();
const namespace = searchStorage.getNamespace("documents");
// Usage example
await namespace.write({
upsertRows: [...],
distanceMetric: "cosine_distance"
});
listNamespaces
List namespaces in your project.
const searchStorage = useSearchStorage();
const { namespaces, nextCursor } = await searchStorage.listNamespaces();
console.log("Namespaces:", namespaces); // ["docs", "products"]
The method accepts an options object with these properties:
Option | Type | Description |
---|---|---|
prefix | string | Optional prefix to filter namespace names by |
limit | number | Maximum number of results to return per page |
cursor | string | Cursor for pagination from previous response |
Returns an object with:
namespaces
: Array of namespace namesnextCursor
: Cursor for the next page, or undefined if no more results
ensureNamespace
Create a namespace if it doesn’t exist.
const searchStorage = useSearchStorage();
const { created } = await searchStorage.ensureNamespace("documents");
if (created) {
console.log("Namespace was created");
} else {
console.log("Namespace already existed");
}
deleteNamespace
Delete a namespace and all its data.
const searchStorage = useSearchStorage();
const { deleted } = await searchStorage.deleteNamespace("old-namespace");
if (deleted) {
console.log("Namespace was deleted");
} else {
console.log("Namespace could not be deleted");
}
namespaceExists
Check if a namespace exists.
const searchStorage = useSearchStorage();
const exists = await searchStorage.namespaceExists("documents");
if (exists) {
console.log("Namespace exists");
} else {
console.log("Namespace does not exist");
}
hasEnsuredNamespace
Check if a namespace has been ensured in the current session.
const searchStorage = useSearchStorage();
const isEnsured = searchStorage.hasEnsuredNamespace("documents");
if (isEnsured) {
console.log("Namespace has been ensured in this session");
} else {
console.log("Namespace has not been ensured yet");
}
SearchClient
The SearchClient
class provides a way to interact with GenSX vector search capabilities outside of the GenSX workflow context, such as from regular Node.js applications or server endpoints.
Import
import { SearchClient } from "@gensx/storage";
Constructor
constructor()
Example
const searchClient = new SearchClient();
Methods
getNamespace
Get a namespace instance and ensure it exists first.
async getNamespace(name: string): Promise<Namespace>
Example
const namespace = await searchClient.getNamespace("products");
// Then use the namespace to upsert or query vectors
await namespace.write({
upsertRows: [
{
id: "product-1",
vector: [0.1, 0.2, 0.3, ...],
name: "Product 1",
category: "electronics"
}
],
distanceMetric: "cosine_distance"
});
ensureNamespace
Create a namespace if it doesn’t exist.
async ensureNamespace(name: string): Promise<EnsureNamespaceResult>
Example
const { created } = await searchClient.ensureNamespace("products");
if (created) {
console.log("Namespace was created");
}
listNamespaces
List all namespaces.
async listNamespaces(options?: {
prefix?: string;
limit?: number;
cursor?: string;
}): Promise<{
namespaces: string[];
nextCursor?: string;
}>
Example
const { namespaces, nextCursor } = await searchClient.listNamespaces();
console.log("Available namespaces:", namespaces); // ["products", "customers", "orders"]
deleteNamespace
Delete a namespace.
async deleteNamespace(name: string): Promise<DeleteNamespaceResult>
Example
const { deleted } = await searchClient.deleteNamespace("temp-namespace");
if (deleted) {
console.log("Namespace was removed");
}
namespaceExists
Check if a namespace exists.
async namespaceExists(name: string): Promise<boolean>
Example
if (await searchClient.namespaceExists("products")) {
console.log("Products namespace exists");
} else {
console.log("Products namespace doesn't exist yet");
}
Usage in applications
The SearchClient is particularly useful when you need to access vector search functionality from:
- Regular Express.js or Next.js API routes
- Background jobs or workers
- Custom scripts or tools
- Any Node.js application outside the GenSX workflow context
// Example: Using SearchClient in an Express handler
import express from 'express';
import { SearchClient } from '@gensx/storage';
import { OpenAI } from 'openai';
const app = express();
const searchClient = new SearchClient();
const openai = new OpenAI();
app.post('/api/search', async (req, res) => {
try {
const { query } = req.body;
// Generate embedding for the query
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: query
});
// Search for similar documents
const namespace = await searchClient.getNamespace('documents');
const results = await namespace.query({
vector: embedding.data[0].embedding,
topK: 5,
includeAttributes: true
});
res.json(results);
} catch (error) {
console.error('Search error:', error);
res.status(500).json({ error: 'Search error' });
}
});
app.listen(3000, () => {
console.log('Server running on port 3000');
});
Filter operators
Filters use a structured array format with the following pattern:
// Basic filter structure
[
"Operation", // And, Or, Not
[ // Array of conditions
["field", "Operator", value]
]
]
Available operators:
Operator | Description | Example |
---|---|---|
Eq | Equals | ["field", "Eq", "value"] |
Ne | Not equals | ["field", "Ne", "value"] |
Gt | Greater than | ["field", "Gt", 10] |
Gte | Greater than or equal | ["field", "Gte", 10] |
Lt | Less than | ["field", "Lt", 10] |
Lte | Less than or equal | ["field", "Lte", 10] |
In | In array | ["field", "In", ["a", "b"]] |
Nin | Not in array | ["field", "Nin", ["a", "b"]] |
Contains | String contains | ["field", "Contains", "text"] |
ContainsAny | Contains any of values | ["tags", "ContainsAny", ["news", "tech"]] |
ContainsAll | Contains all values | ["tags", "ContainsAll", ["imp", "urgent"]] |
RankBy options
The rankBy
parameter can be used in two primary ways:
Attribute-based ranking
Sorts by a field in ascending or descending order:
// Sort by the createdAt attribute in ascending order
rankBy: ["createdAt", "asc"]
// Sort by price in descending order (highest first)
rankBy: ["price", "desc"]
Text-based ranking
For full-text search relevance scoring:
// Basic BM25 text ranking
rankBy: ["text", "BM25", "search query"]
// BM25 with multiple search terms
rankBy: ["text", "BM25", ["term1", "term2"]]
// Combined text ranking strategies
rankBy: ["Sum", [
["text", "BM25", "search query"],
["text", "BM25", "another term"]
]]
// Weighted text ranking (multiply BM25 score by 0.5)
rankBy: ["Product", [["text", "BM25", "search query"], 0.5]]
// Alternative syntax for weighted ranking
rankBy: ["Product", [0.5, ["text", "BM25", "search query"]]]
Use these options to fine-tune the relevance and ordering of your search results.