sentenceTransformers (SBERT)#

Sentence Transformers (SBERT) is a Python module for accessing, using, and training embeddings.

Dependencies#

To use SBERT in Vecworks additional dependencies need to be installed. You may do so using pip:

pip install vecworks[sbert]

API#

class vecworks.vectorizers.sbert.sbertVectorizer#

Wrapper class to ease use of sentence-transformers vectorizers in Vecworks.

Interface#

__init__(transformer: SentenceTransformer, prompt_name: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, device: str = None)#

Initializes the vectorizer.

Parameters#

transformer: SentenceTransformer to be used to vectorize the data.
prompt_name: The name of prompt to use for encoding, as specified using the prompts parameters during the initialization of the transformer.
batch_size: The number of sentences to pass simultaneously to the transformer.
precision: The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.
normalize: Whether to normalize the vectors output by the transformer.
device: torch.device to use for computation.

transform(input: Any | Iterable[Any]) → ndarray#

Vectorizes the given data.

Also see: Vectorizer.

Utilities#

static create_from_string(model_name_or_path: str, prompt_format: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, truncate_dim: int | None = None, device: str = None, local_files_only: bool = False, trust_remote_code: bool = False, token: str | None = None) → type[sbertVectorizer]#

Initializes a SentenceTransformer from a string defining a file path or HuggingFace model, and uses this transformer to initialise an instance of sbertVectorizer.

Parameters#

model_name_or_path: File path to the on-disk model, or otherwise the name of a HuggingFace model.
prompt_format: Prompt format to apply to queries.
batch_size: The number of sentences to pass simultaneously to the transformer.
precision: The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.
normalize: Whether to normalize the vectors output by the transformer.
truncate_dim: The dimensions to truncate vectors to.
device: torch.device to use for computation.
local_files_only: Whether or not to only look at local files for loading models.
trust_remote_code: Whether or not to allow any remotely downloaded models from executing local code.
token: HuggingFace authentication token to download gated models.

sentenceTransformers (SBERT)

Contents

sentenceTransformers (SBERT)#

Dependencies#

API#

Interface#

Parameters#

Utilities#

Parameters#