sentenceTransformers (SBERT)#

Sentence Transformers (SBERT) is a Python module for accessing, using, and training embeddings.

Dependencies#

To use SBERT in Vecworks additional dependencies need to be installed. You may do so using pip:

pip install vecworks[sbert]

API#

class vecworks.vectorizers.sbert.sbertVectorizer#

Wrapper class to ease use of sentence-transformers vectorizers in Vecworks.

Interface#

__init__(transformer: SentenceTransformer, prompt_name: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, device: str = None)#

Initializes the vectorizer.

Parameters#

transformer

SentenceTransformer to be used to vectorize the data.

prompt_name

The name of prompt to use for encoding, as specified using the prompts parameters during the initialization of the transformer.

batch_size

The number of sentences to pass simultaneously to the transformer.

precision

The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.

normalize

Whether to normalize the vectors output by the transformer.

device

torch.device to use for computation.

transform(input: Any | Iterable[Any]) ndarray#

Vectorizes the given data.

Also see: Vectorizer.

Utilities#

static create_from_string(model_name_or_path: str, prompt_format: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, truncate_dim: int | None = None, device: str = None, local_files_only: bool = False, trust_remote_code: bool = False, token: str | None = None) type[sbertVectorizer]#

Initializes a SentenceTransformer from a string defining a file path or HuggingFace model, and uses this transformer to initialise an instance of sbertVectorizer.

Parameters#

model_name_or_path

File path to the on-disk model, or otherwise the name of a HuggingFace model.

prompt_format

Prompt format to apply to queries.

batch_size

The number of sentences to pass simultaneously to the transformer.

precision

The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.

normalize

Whether to normalize the vectors output by the transformer.

truncate_dim

The dimensions to truncate vectors to.

device

torch.device to use for computation.

local_files_only

Whether or not to only look at local files for loading models.

trust_remote_code

Whether or not to allow any remotely downloaded models from executing local code.

token

HuggingFace authentication token to download gated models.