Balancer#
- class langworks.middleware.balancer.Balancer#
Middleware allowing to distribute queries among other middleware, allowing for load balancing. Optionally, load balancing may be enhanced with autoscaling, allowing to control at what rate middleware are made available or unavailable.
Fundamentals#
- __init__(middleware: Sequence[Middleware], autoscale_threshold: tuple[float, float] = (0, 0))#
Initialized the
Balancer
.Parameters#
- middleware
Instantiated middleware to which queries may be distributed, giving priority to middleware specified first.
- autoscale_threshold
Pair of thresholds specifying at what number of queries per middleware to scale up (first item) or scale down (second item). By default this is set to (0, 0), setting the balancer up to immediately scale up to use all resources, while never scaling down.
Methods#
- exec(query: str = None, role: str = None, guidance: str = None, history: Thread = None, context: dict = None, params: SamplingParams = None) tuple[Thread, dict[str, Any]] #
Generate a new message, following up on the message passed using the given guidance and sampling parameters.
Parameters#
- query
The query to prompt the LLM with, optionally formatted using Langworks’ static DSL.
- role
The role of the agent stating this query, usually ‘user’, ‘system’ or ‘assistant’.
- guidance
Template for the message to be generated, formatted using Langworks’ dyanmic DSL.
- history
Conversational history (thread) to prepend to the prompt.
- context
Context to reference when filling in the templated parts of the query, guidance and history. In case the Langwork or the input also define a context, the available contexts are merged. When duplicate attributes are observed, the value is copied from the most specific context, i.e. input context over Query context, and Query context over Langwork context.
- params
Sampling parameters, wrapped by a SamplingParams object, specifying how the LLM should select subsequent tokens.