Search reference
API reference for GenSX Cloud search components. Search is powered by turbopuffer, and their documentation for query and upsert operations is a useful reference to augment this document.
Installation
npm install @gensx/storage
useSearch
Hook that provides access to vector search for a specific namespace.
Import
import { useSearch } from "@gensx/storage";
Signature
function useSearch(
name: string,
options?: SearchStorageOptions,
): Promise<Namespace>;
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
name | string | Required | The namespace name to access |
options | SearchStorageOptions | {} | Optional configuration properties |
Returns
Returns a namespace object with methods to interact with vector search.
Example
// Simple usage
const namespace = await useSearch("documents");
const results = await namespace.query({
vector: queryEmbedding,
includeAttributes: true,
});
// With configuration
const namespace = await useSearch("documents", {
project: "my-project",
environment: "production",
});
Namespace methods
The namespace object returned by useSearch
provides these methods:
write
Inserts, updates, or deletes vectors in the namespace.
async write(options: WriteParams): Promise<number>
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
upsertColumns | UpsertColumns | undefined | Column-based format for upserting documents |
upsertRows | UpsertRows | undefined | Row-based format for upserting documents |
patchColumns | PatchColumns | undefined | Column-based format for patching documents |
patchRows | PatchRows | undefined | Row-based format for patching documents |
deletes | Id[] | undefined | Array of document IDs to delete |
deleteByFilter | Filters | undefined | Filter to match documents for deletion |
distanceMetric | DistanceMetric | undefined | Distance metric for similarity calculations |
schema | Schema | undefined | Optional schema definition for attributes |
Example
// Upsert documents in column-based format
await namespace.write({
upsertColumns: {
id: ["doc-1", "doc-2"],
vector: [
[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6],
],
text: ["Document 1", "Document 2"],
category: ["article", "blog"],
},
distanceMetric: "cosine_distance",
schema: {
text: { type: "string" },
category: { type: "string" },
},
});
// Upsert documents in row-based format
await namespace.write({
upsertRows: [
{
id: "doc-1",
vector: [0.1, 0.2, 0.3],
text: "Document 1",
category: "article",
},
{
id: "doc-2",
vector: [0.4, 0.5, 0.6],
text: "Document 2",
category: "blog",
},
],
distanceMetric: "cosine_distance",
});
// Delete documents by ID
await namespace.write({
deletes: ["doc-1", "doc-2"],
});
// Delete documents by filter
await namespace.write({
deleteByFilter: [
"And",
[
["category", "Eq", "article"],
["createdAt", "Lt", "2023-01-01"],
],
],
});
// Patch documents (update specific fields)
await namespace.write({
patchRows: [
{
id: "doc-1",
category: "updated-category",
},
],
});
Return value
Returns the number of rows affected by the operation.
query
Searches for similar vectors based on a query vector.
async query(options: QueryOptions): Promise<QueryResults>
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
vector | number[] | Required | Query vector for similarity search |
topK | number | 10 | Number of results to return |
includeVectors | boolean | false | Whether to include vectors in results |
includeAttributes | boolean | string[] | true | Include all attributes or specified ones |
filters | Filters | undefined | Metadata filters |
rankBy | RankBy | undefined | Attribute-based ranking or text ranking |
consistency | string | undefined | Consistency level for reads |
Example
const results = await namespace.query({
vector: [0.1, 0.2, 0.3, ...], // Query vector
topK: 10, // Number of results to return
includeVectors: false, // Whether to include vectors in results
includeAttributes: true, // Include all attributes or specific ones
filters: [ // Optional metadata filters
"And",
[
["category", "Eq", "article"],
["createdAt", "Gte", "2023-01-01"]
]
],
rankBy: ["attributes.importance", "asc"], // Optional attribute-based ranking
});
Return value
Returns an array of matched documents with similarity scores:
[
{
id: "doc-1", // Document ID
score: 0.87, // Similarity score (0-1)
vector?: number[], // Original vector (if includeVectors=true)
attributes?: { // Metadata (if includeAttributes=true)
text: "Document content",
category: "article",
createdAt: "2023-07-15"
}
},
// ...more results
]
getSchema
Retrieves the current schema for the namespace.
async getSchema(): Promise<Schema>
Example
const schema = await namespace.getSchema();
console.log(schema);
// {
// text: "string",
// category: "string",
// createdAt: "string"
// }
updateSchema
Updates the schema for the namespace.
async updateSchema(options: { schema: Schema }): Promise<Schema>
Parameters
Parameter | Type | Description |
---|---|---|
schema | Schema | New schema definition |
Example
const updatedSchema = await namespace.updateSchema({
schema: {
text: "string",
category: "string",
createdAt: "string",
newField: "number", // Add new field
tags: "string[]", // Add array field
},
});
Return value
Returns the updated schema.
getMetadata
Retrieves metadata about the namespace.
async getMetadata(): Promise<NamespaceMetadata>
Example
const metadata = await namespace.getMetadata();
console.log(metadata);
// {
// vectorCount: 1250,
// dimensions: 1536,
// distanceMetric: "cosine_distance",
// created: "2023-07-15T12:34:56Z"
// }
SearchClient
The SearchClient
class provides a way to interact with GenSX vector search capabilities outside of the GenSX workflow context, such as from regular Node.js applications or server endpoints.
Import
import { SearchClient } from "@gensx/storage";
Constructor
constructor(options?: SearchStorageOptions);
Parameters
Parameter | Type | Default | Description |
---|---|---|---|
options | SearchStorageOptions | {} | Optional configuration properties |
Example
// Default client
const searchClient = new SearchClient();
// With configuration
const searchClient = new SearchClient({
project: "my-project",
environment: "production",
});
Methods
getNamespace
Get a namespace instance and ensure it exists first.
async getNamespace(name: string): Promise<Namespace>
Example
const namespace = await searchClient.getNamespace("products");
// Then use the namespace to upsert or query vectors
await namespace.write({
upsertRows: [
{
id: "product-1",
vector: [0.1, 0.2, 0.3, ...],
name: "Product 1",
category: "electronics"
}
],
distanceMetric: "cosine_distance"
});
ensureNamespace
Create a namespace if it doesn’t exist.
async ensureNamespace(name: string): Promise<EnsureNamespaceResult>
Example
const { created } = await searchClient.ensureNamespace("products");
if (created) {
console.log("Namespace was created");
}
listNamespaces
List all namespaces.
async listNamespaces(options?: {
prefix?: string;
limit?: number;
cursor?: string;
}): Promise<{
namespaces: string[];
nextCursor?: string;
}>
Example
const { namespaces, nextCursor } = await searchClient.listNamespaces();
console.log("Available namespaces:", namespaces); // ["products", "customers", "orders"]
deleteNamespace
Delete a namespace.
async deleteNamespace(name: string): Promise<DeleteNamespaceResult>
Example
const { deleted } = await searchClient.deleteNamespace("temp-namespace");
if (deleted) {
console.log("Namespace was removed");
}
namespaceExists
Check if a namespace exists.
async namespaceExists(name: string): Promise<boolean>
Example
if (await searchClient.namespaceExists("products")) {
console.log("Products namespace exists");
} else {
console.log("Products namespace doesn't exist yet");
}
Usage in applications
The SearchClient is particularly useful when you need to access vector search functionality from:
- Regular Express.js or Next.js API routes
- Background jobs or workers
- Custom scripts or tools
- Any Node.js application outside the GenSX workflow context
// Example: Using SearchClient in an Express handler
import express from "express";
import { SearchClient } from "@gensx/storage";
import { OpenAI } from "openai";
const app = express();
const searchClient = new SearchClient();
const openai = new OpenAI();
app.post("/api/search", async (req, res) => {
try {
const { query } = req.body;
// Generate embedding for the query
const embedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: query,
});
// Search for similar documents
const namespace = await searchClient.getNamespace("documents");
const results = await namespace.query({
vector: embedding.data[0].embedding,
topK: 5,
includeAttributes: true,
});
res.json(results);
} catch (error) {
console.error("Search error:", error);
res.status(500).json({ error: "Search error" });
}
});
app.listen(3000, () => {
console.log("Server running on port 3000");
});
Filter operators
Filters use a structured array format with the following pattern:
// Basic filter structure
[
"Operation", // And, Or, Not
[
// Array of conditions
["field", "Operator", value],
],
];
Available operators:
Operator | Description | Example |
---|---|---|
Eq | Equals | ["field", "Eq", "value"] |
Ne | Not equals | ["field", "Ne", "value"] |
Gt | Greater than | ["field", "Gt", 10] |
Gte | Greater than or equal | ["field", "Gte", 10] |
Lt | Less than | ["field", "Lt", 10] |
Lte | Less than or equal | ["field", "Lte", 10] |
In | In array | ["field", "In", ["a", "b"]] |
Nin | Not in array | ["field", "Nin", ["a", "b"]] |
Contains | String contains | ["field", "Contains", "text"] |
ContainsAny | Contains any of values | ["tags", "ContainsAny", ["news", "tech"]] |
ContainsAll | Contains all values | ["tags", "ContainsAll", ["imp", "urgent"]] |
RankBy options
The rankBy
parameter can be used in two primary ways:
Attribute-based ranking
Sorts by a field in ascending or descending order:
// Sort by the createdAt attribute in ascending order
rankBy: ["createdAt", "asc"];
// Sort by price in descending order (highest first)
rankBy: ["price", "desc"];
Text-based ranking
For full-text search relevance scoring:
// Basic BM25 text ranking
rankBy: ["text", "BM25", "search query"];
// BM25 with multiple search terms
rankBy: ["text", "BM25", ["term1", "term2"]];
// Combined text ranking strategies
rankBy: [
"Sum",
[
["text", "BM25", "search query"],
["text", "BM25", "another term"],
],
];
// Weighted text ranking (multiply BM25 score by 0.5)
rankBy: ["Product", [["text", "BM25", "search query"], 0.5]];
// Alternative syntax for weighted ranking
rankBy: ["Product", [0.5, ["text", "BM25", "search query"]]];
Use these options to fine-tune the relevance and ordering of your search results.
SearchStorageOptions
Configuration properties for search operations.
Prop | Type | Default | Description |
---|---|---|---|
project | string | Auto-detected | Project to use for cloud storage. If you don’t set this, it’ll first check your GENSX_PROJECT environment variable, then look for the project name in your local gensx.yaml file. |
environment | string | Auto-detected | Environment to use for cloud storage. If you don’t set this, it’ll first check your GENSX_ENV environment variable, then use whatever environment you’ve selected in the CLI with gensx env select . |