sentenceTransformers (SBERT)#
Sentence Transformers (SBERT) is a Python module for accessing, using, and training embeddings.
Dependencies#
To use SBERT in Vecworks additional dependencies need to be installed. You may do so using pip:
pip install vecworks[sbert]
API#
- class vecworks.vectorizers.sbert.sbertVectorizer#
Wrapper class to ease use of sentence-transformers vectorizers in Vecworks.
Interface#
- __init__(transformer: SentenceTransformer, prompt_name: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, device: str = None)#
Initializes the vectorizer.
Parameters#
- transformer
SentenceTransformer to be used to vectorize the data.
- prompt_name
The name of prompt to use for encoding, as specified using the prompts parameters during the initialization of the transformer.
- batch_size
The number of sentences to pass simultaneously to the transformer.
- precision
The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.
- normalize
Whether to normalize the vectors output by the transformer.
- device
torch.device to use for computation.
- transform(input: Any | Iterable[Any]) ndarray #
Vectorizes the given data.
Also see:
Vectorizer
.
Utilities#
- static create_from_string(model_name_or_path: str, prompt_format: str | None = None, batch_size: int = 32, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize: bool = False, truncate_dim: int | None = None, device: str = None, local_files_only: bool = False, trust_remote_code: bool = False, token: str | None = None) type[sbertVectorizer] #
Initializes a SentenceTransformer from a string defining a file path or HuggingFace model, and uses this transformer to initialise an instance of sbertVectorizer.
Parameters#
- model_name_or_path
File path to the on-disk model, or otherwise the name of a HuggingFace model.
- prompt_format
Prompt format to apply to queries.
- batch_size
The number of sentences to pass simultaneously to the transformer.
- precision
The precision to use during the vectorization process. Generally, the more accurate, the better the accuracy, but at the cost of slower execution.
- normalize
Whether to normalize the vectors output by the transformer.
- truncate_dim
The dimensions to truncate vectors to.
- device
torch.device to use for computation.
- local_files_only
Whether or not to only look at local files for loading models.
- trust_remote_code
Whether or not to allow any remotely downloaded models from executing local code.
- token
HuggingFace authentication token to download gated models.