Vecworks documentation

Contents

Vecworks documentation#

Vecworks is an open-source framework build on top of Pypeworks to procedurally query vector stores in Python. It offers the following features:

  • Standardized access to various vectorization platforms, like OpenAI’s, SBERT and fastText

  • Mixing and matching of vectorization procedures, reducable to a single output using ensemblers

  • Remote serving of custom vectorization procedures using the built-in server

Install#

Vecworks is available through the PyPI repository and can be installed using pip:

pip install vecworks

Quickstart#

Vecworks’ key concept is that of the Retriever. A retriever is a specialised Pypeworks node that vectorizes inputs, allowing to cross-reference these inputs with data in a vector store. As nodes retrievers may be embedded in Pypeworks pipeworks, enabling various applications, including semantic text matching, document classification, and RAG (when combined with Langworks).

Assuming a vector store has been set-up on a PostgreSQL-database, a retriever may be instantiated as follows:

import vecworks

from vecworks.retrievers.pgvector import (
    pgvectorRetriever
)

from vecworks.vectorizers.sbert import (
    sbertVectorizer
)

match = pgvectorRetriever(

    url   = "postgresql://127.0.0.1:5432/rag-mini-wikipedia",
    # Populated using https://huggingface.co/datasets/rag-datasets/rag-mini-wikipedia

    authenticator = vecworks.auth.UsernameCredentials("username", "password"),

    table = '"text-corpus"',

    index = [

        vecworks.Index(

            name         = "passage-e5-ml-large-q",
            # Column derived from 'passage', populated with vectorized contents of 'passage'.

            bind         = "input",

            distance     = vecworks.DISTANCES.cosine,
            max_distance = 0.2,
            top_k        = 5,

            vectorizer   = sbertVectorizer.create_from_string(

                "intfloat/multilingual-e5-large",

                prompt_format = "query: "
                normalize     = True

            ),

            density      = vecworks.DENSITY.dense

        ),

    ],

    return_columns = ["passage", "id"]

    top_k       = 3

)