Dagster & Chroma
This is a community-maintained integration. To report bugs or leave feedback, open an issue in the Dagster community integrations repo.
The Chroma library allows you to easily interact with Chroma's vector database capabilities to build AI-powered data pipelines in Dagster. You can perform vector similarity searches, manage schemas, and handle data operations directly from your Dagster assets.
Installation
- uv
- pip
uv add dagster-chroma
pip install dagster-chroma
Example
import os
from dagster_chroma import ChromaResource, HttpConfig, LocalConfig
import dagster as dg
@dg.asset
def my_table(chroma: ChromaResource):
with chroma.get_client() as chroma_client:
collection = chroma_client.create_collection("fruits")
collection.add(
documents=[
"This is a document about oranges",
"This is a document about pineapples",
"This is a document about strawberries",
"This is a document about cucumbers",
],
ids=["oranges", "pineapples", "strawberries", "cucumbers"],
)
results = collection.query(
query_texts=["hawaii"],
n_results=1,
)
defs = dg.Definitions(
assets=[my_table],
resources={
"chroma": ChromaResource(
connection_config=LocalConfig(persistence_path="./chroma")
if os.getenv("DEV")
else HttpConfig(host="192.168.0.10", port=8000)
),
},
)
About Chroma
Chroma is the open-source AI application database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs. It provides a simple API for storing and querying embeddings, documents, and metadata. Chroma can be used to build semantic search, question answering, and other AI-powered applications. The database can run embedded in your application or as a separate service.