Search reference

API reference for GenSX Cloud search components. Search is powered by turbopuffer, and their documentation for query and write operations is a useful reference to augment this document.

Installation


npm install @gensx/storage

useSearch

Hook that provides access to vector search for a specific namespace.

Import


import { useSearch } from "@gensx/storage";

Signature


function useSearch(
  name: string,
  options?: SearchStorageOptions,
): Promise<Namespace>;

Parameters

Parameter	Type	Default	Description
`name`	`string`	Required	The namespace name to access
`options`	`SearchStorageOptions`	`{}`	Optional configuration properties

Returns

Returns a namespace object with methods to interact with vector search.

Example


// Simple usage
const namespace = await useSearch("documents");
const results = await namespace.query({
  vector: queryEmbedding,
  includeAttributes: true,
});
 
// With explicit configuration
const namespace = await useSearch("documents", {
  project: "my-project",
  environment: "production",
});

Namespace methods

The namespace object returned by useSearch provides these methods:

write

Inserts, updates, or deletes vectors in the namespace.


async write(options: WriteParams): Promise<{ message: string; rowsAffected: number }>

Parameters

Parameter	Type	Default	Description
`upsertColumns`	`UpsertColumns`	none	Column-based format for upserting documents
`upsertRows`	`UpsertRows`	none	Row-based format for upserting documents
`patchColumns`	`PatchColumns`	none	Column-based format for patching documents
`patchRows`	`PatchRows`	none	Row-based format for patching documents
`deletes`	`ID[]`	none	Array of document IDs to delete
`deleteByFilter`	`Filter`	none	Filter to match documents for deletion
`distanceMetric`	`DistanceMetric`	none	Distance metric for similarity calculations
`schema`	`Schema`	none	Optional schema definition for attributes

Example


// Upsert documents in column-based format
const result = await namespace.write({
  upsertColumns: {
    id: ["doc-1", "doc-2"],
    vector: [
      [0.1, 0.2, 0.3],
      [0.4, 0.5, 0.6],
    ],
    text: ["Document 1", "Document 2"],
    category: ["article", "blog"],
  },
  distanceMetric: "cosine_distance",
  schema: {
    text: { type: "string" },
    category: { type: "string" },
  },
});
console.log(result); // { message: "Successfully wrote 2 rows", rowsAffected: 2 }
 
// Upsert documents in row-based format
await namespace.write({
  upsertRows: [
    {
      id: "doc-1",
      vector: [0.1, 0.2, 0.3],
      text: "Document 1",
      category: "article",
    },
    {
      id: "doc-2",
      vector: [0.4, 0.5, 0.6],
      text: "Document 2",
      category: "blog",
    },
  ],
  distanceMetric: "cosine_distance",
});
 
// Delete documents by ID
await namespace.write({
  deletes: ["doc-1", "doc-2"],
});
 
// Delete documents by filter
await namespace.write({
  deleteByFilter: [
    "And",
    [
      ["category", "Eq", "article"],
      ["createdAt", "Lt", "2023-01-01"],
    ],
  ],
});
 
// Patch documents (update specific fields)
await namespace.write({
  patchRows: [
    {
      id: "doc-1",
      category: "updated-category",
    },
  ],
});

Return value

Returns an object with a success message and the number of rows affected by the operation:


{
  message: "Successfully wrote 2 rows",
  rowsAffected: 2
}

query

Searches for similar vectors based on a query vector or other ranking criteria.


async query(options: QueryOptions): Promise<QueryResults>

Parameters

Parameter	Type	Default	Description
`rankBy`	`RankBy`	none	Vector, text, or attribute-based ranking
`topK`	`number`	none	Number of results to return
`includeAttributes`	`boolean \| string[]`	`['id']`	Include all attributes or specified ones
`filters`	`Filter`	none	Metadata filters
`aggregateBy`	`AggregateBy`	none	Aggregate results by specified fields
`consistency`	`Consistency`	none	Consistency level for reads

Example


const results = await namespace.query({
  topK: 10,                         // Number of results to return
  includeAttributes: true,          // Include all attributes or specific ones
  filters: [                        // Optional metadata filters
    "And",
    [
      ["category", "Eq", "article"],
      ["createdAt", "Gte", "2023-01-01"]
    ]
  ],
  rankBy: ["vector", "ANN", [0.1, 0.2, 0.3, ...]],  // Vector similarity search
  // OR
  rankBy: ["text", "BM25", "search query"],          // Text search
  // OR
  rankBy: ["importance", "desc"],                    // Attribute-based ranking
});

Return value

Returns a QueryResults object with an array of matched documents:


{
  rows: [
    {
      id: "doc-1",            // Document ID
      $dist: 0.13,            // Distance score (lower is more similar for most metrics)
      vector: number[],      // If specified in includeAttributes
      text: "Document content",     // Other attributes specified in includeAttributes
      category: "article",
      createdAt: "2023-07-15"
    },
    // ...more results
  ],
  aggregations: {          // Aggregation results (if aggregateBy was specified)
    "numberOfDocuments": 100,
  }
}

getSchema

Retrieves the current schema for the namespace.


async getSchema(): Promise<Schema>

Example


const schema = await namespace.getSchema();
console.log(schema);
// {
//   text: "string",
//   category: "string",
//   createdAt: "string"
// }

updateSchema

Updates the schema for the namespace.


async updateSchema(options: { schema: Schema }): Promise<Schema>

Parameters

Parameter	Type	Description
`schema`	`Schema`	New schema definition

Example


const updatedSchema = await namespace.updateSchema({
  schema: {
    text: "string",
    category: "string",
    createdAt: "string",
    newField: "number", // Add new field
    tags: "string[]", // Add array field
  },
});

Return value

Returns the updated schema.

SearchClient

The SearchClient class provides a way to interact with GenSX vector search capabilities outside of the GenSX workflow context, such as from regular Node.js applications or server endpoints.

Import


import { SearchClient } from "@gensx/storage";

Constructor


constructor(options?: SearchStorageOptions);

Parameters

Parameter	Type	Default	Description
`options`	`SearchStorageOptions`	`{}`	Optional configuration properties

Example


// Default client
const searchClient = new SearchClient();
 
// With configuration
const searchClient = new SearchClient({
  project: "my-project",
  environment: "production",
});

Methods

getNamespace

Get a namespace instance and ensure it exists first.


async getNamespace(name: string): Promise<Namespace>

Example


const namespace = await searchClient.getNamespace("products");
 
// Then use the namespace to upsert or query vectors
await namespace.write({
  upsertRows: [
    {
      id: "product-1",
      vector: [0.1, 0.2, 0.3, ...],
      name: "Product 1",
      category: "electronics"
    }
  ],
  distanceMetric: "cosine_distance"
});

ensureNamespace

Create a namespace if it doesn’t exist.


async ensureNamespace(name: string): Promise<EnsureNamespaceResult>

Example


const { created } = await searchClient.ensureNamespace("products");
if (created) {
  console.log("Namespace was created");
}

listNamespaces

List all namespaces.


async listNamespaces(options?: {
  prefix?: string;
  limit?: number;
  cursor?: string;
}): Promise<{
  namespaces: { name: string; createdAt: Date }[];
  nextCursor?: string;
}>

Example


const { namespaces, nextCursor } = await searchClient.listNamespaces();
console.log("Available namespaces:", namespaces.map(ns => ns.name)); // ["products", "customers", "orders"]

deleteNamespace

Delete a namespace.


async deleteNamespace(name: string): Promise<DeleteNamespaceResult>

Example


const { deleted } = await searchClient.deleteNamespace("temp-namespace");
if (deleted) {
  console.log("Namespace was removed");
}

namespaceExists

Check if a namespace exists.


async namespaceExists(name: string): Promise<boolean>

Example


if (await searchClient.namespaceExists("products")) {
  console.log("Products namespace exists");
} else {
  console.log("Products namespace doesn't exist yet");
}

Usage in applications

The SearchClient is particularly useful when you need to access vector search functionality from:

Regular Express.js or Next.js API routes
Background jobs or workers
Custom scripts or tools
Any Node.js application outside the GenSX workflow context


// Example: Using SearchClient in an Express handler
import express from "express";
import { SearchClient } from "@gensx/storage";
import { OpenAI } from "openai";
 
const app = express();
const searchClient = new SearchClient();
const openai = new OpenAI();
 
app.post("/api/search", async (req, res) => {
  try {
    const { query } = req.body;
 
    // Generate embedding for the query
    const embedding = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: query,
    });
 
    // Search for similar documents
    const namespace = await searchClient.getNamespace("documents");
    const results = await namespace.query({
      rankBy: ["vector", "ANN", embedding.data[0].embedding],
      topK: 5,
      includeAttributes: true,
    });
 
    res.json(results);
  } catch (error) {
    console.error("Search error:", error);
    res.status(500).json({ error: "Search error" });
  }
});
 
app.listen(3000, () => {
  console.log("Server running on port 3000");
});

Filter operators

Filters use a structured array format with the following pattern:


// Basic filter structure
[
  "Operation", // And, Or, Not
  [
    // Array of conditions
    ["field", "Operator", value],
  ],
];

Available operators:

Operator	Description	Example
`Eq`	Equals	`["field", "Eq", "value"]`
`Ne`	Not equals	`["field", "Ne", "value"]`
`Gt`	Greater than	`["field", "Gt", 10]`
`Gte`	Greater than or equal	`["field", "Gte", 10]`
`Lt`	Less than	`["field", "Lt", 10]`
`Lte`	Less than or equal	`["field", "Lte", 10]`
`In`	In array	`["field", "In", ["a", "b"]]`
`Nin`	Not in array	`["field", "Nin", ["a", "b"]]`
`Contains`	String contains	`["field", "Contains", "text"]`
`ContainsAny`	Contains any of values	`["tags", "ContainsAny", ["news", "tech"]]`
`ContainsAll`	Contains all values	`["tags", "ContainsAll", ["imp", "urgent"]]`

RankBy options

The rankBy parameter can be used in two primary ways:

Attribute-based ranking

Sorts by a field in ascending or descending order:


// Sort by the createdAt attribute in ascending order
rankBy: ["createdAt", "asc"];
 
// Sort by price in descending order (highest first)
rankBy: ["price", "desc"];

Text-based ranking

For full-text search relevance scoring:


// Basic BM25 text ranking
rankBy: ["text", "BM25", "search query"];
 
// BM25 with multiple search terms
rankBy: ["text", "BM25", ["term1", "term2"]];
 
// Combined text ranking strategies
rankBy: [
  "Sum",
  [
    ["text", "BM25", "search query"],
    ["text", "BM25", "another term"],
  ],
];
 
// Weighted text ranking (multiply BM25 score by 0.5)
rankBy: ["Product", [["text", "BM25", "search query"], 0.5]];
 
// Alternative syntax for weighted ranking
rankBy: ["Product", [0.5, ["text", "BM25", "search query"]]];

Use these options to fine-tune the relevance and ordering of your search results.

SearchStorageOptions

Configuration properties for search operations.

Prop	Type	Default	Description
`project`	`string`	Auto-detected	Project to use for cloud storage. If you don’t set this, it’ll first check your `GENSX_PROJECT` environment variable, then look for the project name in your local `gensx.yaml` file.
`environment`	`string`	Auto-detected	Environment to use for cloud storage. If you don’t set this, it’ll first check your `GENSX_ENV` environment variable, then use whatever environment you’ve selected in the CLI with `gensx env select`.