Skip to main content

Dagster & Chroma

Community integration

This is a community-maintained integration. To report bugs or leave feedback, open an issue in the Dagster community integrations repo.

The Chroma library allows you to easily interact with Chroma's vector database capabilities to build AI-powered data pipelines in Dagster. You can perform vector similarity searches, manage schemas, and handle data operations directly from your Dagster assets.

Installation

uv add dagster-chroma

Example

import os

from dagster_chroma import ChromaResource, HttpConfig, LocalConfig

import dagster as dg


@dg.asset
def my_table(chroma: ChromaResource):
with chroma.get_client() as chroma_client:
collection = chroma_client.create_collection("fruits")

collection.add(
documents=[
"This is a document about oranges",
"This is a document about pineapples",
"This is a document about strawberries",
"This is a document about cucumbers",
],
ids=["oranges", "pineapples", "strawberries", "cucumbers"],
)

results = collection.query(
query_texts=["hawaii"],
n_results=1,
)


defs = dg.Definitions(
assets=[my_table],
resources={
"chroma": ChromaResource(
connection_config=LocalConfig(persistence_path="./chroma")
if os.getenv("DEV")
else HttpConfig(host="192.168.0.10", port=8000)
),
},
)

About Chroma

Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. It provides a simple API for storing and querying embeddings, documents, and metadata. Chroma can be used to build semantic search, question answering, and other AI-powered applications. The database can run embedded in your application or as a separate service.