openapi: "3.0.3" info: title: Aleph Alpha API version: "1.16.1" description: Access and interact with Aleph Alpha models and functionality over HTTP endpoints. contact: email: support@aleph-alpha.com servers: - url: https://api.aleph-alpha.com components: securitySchemes: token: type: http scheme: bearer description: Can be generated in your [Aleph Alpha profile](https://app.aleph-alpha.com/profile) schemas: Hosting: type: string nullable: true enum: ["aleph-alpha", null] description: | Optional parameter that specifies which datacenters may process the request. You can either set the parameter to "aleph-alpha" or omit it (defaulting to `null`). Not setting this value, or setting it to `null`, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximum availability. Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy. MultimodalPrompt: title: Multimodal type: array description: An array of prompt items for multimodal request. Can support any combination of text, images, and token ids. items: oneOf: - $ref: "#/components/schemas/TextPromptItem" - $ref: "#/components/schemas/ImagePromptItem" - $ref: "#/components/schemas/TokenIdsPromptItem" TextPromptItem: type: object title: Text required: - type - data properties: type: type: string enum: [text] data: type: string controls: type: array items: type: object required: - start - length - factor properties: start: type: integer description: Starting character index to apply the factor to. length: type: integer description: The amount of characters to apply the factor to. factor: type: number description: | Factor to apply to the given token in the attention matrix. - 0 <= factor < 1 => Suppress the given token - factor == 1 => identity operation, no change to attention - factor > 1 => Amplify the given token token_overlap: type: string enum: [partial, complete] default: partial description: | What to do if a control partially overlaps with a text token. If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.) If set to "complete", the full factor will be applied as long as the control overlaps with the token at all. ImagePromptItem: type: object title: Image required: - type - data properties: type: type: string enum: [image] data: type: string description: | An image send as part of a prompt to a model. The image is represented as base64. Note: The models operate on square images. All non-square images are center-cropped before going to the model, so portions of the image may not be visible. You can supply specific cropping parameters if you like, to choose a different area of the image than a center-crop. Or, you can always transform the image yourself to a square before sending it. x: type: integer description: x-coordinate of top left corner of cropping box in pixels y: type: integer description: y-coordinate of top left corner of cropping box in pixels size: type: integer description: Size of the cropping square in pixels controls: type: array items: type: object required: - rect - factor properties: rect: type: object required: - left - top - width - height description: | Bounding box in logical coordinates. From 0 to 1. With (0,0) being the upper left corner, and relative to the entire image. Keep in mind, non-square images are center-cropped by default before going to the model. (You can specify a custom cropping if you want.). Since control coordinates are relative to the entire image, all or a portion of your control may be outside the "model visible area". properties: left: type: number description: | x-coordinate of top left corner of the control bounding box. Must be a value between 0 and 1, where 0 is the left corner and 1 is the right corner. top: type: number description: | y-coordinate of top left corner of the control bounding box Must be a value between 0 and 1, where 0 is the top pixel row and 1 is the bottom row. width: type: number description: | width of the control bounding box Must be a value between 0 and 1, where 1 means the full width of the image. height: type: number description: | height of the control bounding box Must be a value between 0 and 1, where 1 means the full height of the image. factor: type: number description: | Factor to apply to the given token in the attention matrix. - 0 <= factor < 1 => Suppress the given token - factor == 1 => identity operation, no change to attention - factor > 1 => Amplify the given token token_overlap: type: string enum: [partial, complete] default: partial description: | What to do if a control partially overlaps with an image token. If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers half of the image "tile", would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.) If set to "complete", the full factor will be applied as long as the control overlaps with the token at all. TokenIdsPromptItem: type: object title: Token Ids required: - type - data properties: type: type: string enum: [token_ids] data: type: array items: type: integer controls: type: array items: type: object required: - index - factor properties: index: type: integer description: | Index of the token, relative to the list of tokens IDs in the current prompt item. factor: type: number description: | Factor to apply to the given token in the attention matrix. - 0 <= factor < 1 => Suppress the given token - factor == 1 => identity operation, no change to attention - factor > 1 => Amplify the given token Prompt: description: | This field is used to send prompts to the model. A prompt can either be a text prompt or a multimodal prompt. A text prompt is a string of text. A multimodal prompt is an array of prompt items. It can be a combination of text, images, and token ID arrays. In the case of a multimodal prompt, the prompt items will be concatenated and a single prompt will be used for the model. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into 144 tokens. oneOf: - title: Text Prompt type: string description: The text to be completed. Unconditional completion can be started with an empty string (default). The prompt may contain a zero shot or few shot task. - $ref: "#/components/schemas/MultimodalPrompt" OptimizedPrompt: description: Describes prompt after optimizations. This field is only returned if the flag `disable_optimizations` flag is not set and the prompt has actually changed. type: array items: oneOf: - type: object title: Text properties: type: type: string enum: [text] data: type: string - type: object title: Image properties: type: type: string enum: [image] data: type: string description: base64 encoded image - type: object title: Token Ids properties: type: type: string enum: [token_ids] data: type: array items: type: integer Document: description: | Valid document formats for tasks like Q&A and Summarization. These can be one of the following formats: - Docx: A base64 encoded Docx file - Text: A string of text - Prompt: A multimodal prompt, as is used in our other tasks like Completion Documents of types Docx and Text are usually preferred, and will have optimizations (such as chunking) applied to work better with the respective task that is being run. Prompt documents are assumed to be used for advanced use cases, and will be left as-is. example: { "text": "Some people like pizza more than burgers. Other people don't. But we all love food.", } oneOf: - type: object title: Docx properties: docx: type: string format: base64 - type: object title: Text properties: text: type: string - type: object title: Prompt properties: prompt: $ref: "#/components/schemas/MultimodalPrompt" CompletionRequest: type: object example: model: luminous-base prompt: An apple a day maximum_tokens: 64 properties: model: type: string description: | The name of the model from the Luminous model family. Models and their respective architectures can differ in parameter size and capabilities. The most recent version of the model is always used. The model output contains information as to the model version. hosting: $ref: "#/components/schemas/Hosting" prompt: $ref: "#/components/schemas/Prompt" maximum_tokens: type: integer description: | The maximum number of tokens to be generated. Completion will terminate after the maximum number of tokens is reached. Increase this value to generate longer texts. A text is split into tokens. Usually there are more tokens than words. The sum of input tokens and maximum_tokens may not exceed 2048. minimum_tokens: type: integer default: 0 description: Generate at least this number of tokens before an end-of-text token is generated. echo: type: boolean default: false description: | Echo the prompt in the completion. This may be especially helpful when log_probs is set to return logprobs for the prompt. temperature: type: number default: 0.0 nullable: true description: A higher sampling temperature encourages the model to produce less probable outputs ("be more creative"). Values are expected in a range from 0.0 to 1.0. Try high values (e.g., 0.9) for a more "creative" response and the default 0.0 for a well defined and repeatable answer. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last. top_k: type: integer default: 0 nullable: true description: Introduces random sampling for generated tokens by randomly selecting the next token from the k most likely options. A value larger than 1 encourages the model to be more creative. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last. top_p: type: number default: 0.0 nullable: true description: Introduces random sampling for generated tokens by randomly selecting the next token from the smallest possible set of tokens whose cumulative probability exceeds the probability top_p. Set to 0.0 if repeatable output is desired. It is advised to use either temperature, top_k, or top_p, but not all three at the same time. If a combination of temperature, top_k or top_p is used, rescaling of logits with temperature will be performed first. Then top_k is applied. Top_p follows last. presence_penalty: type: number default: 0.0 nullable: true description: | The presence penalty reduces the likelihood of generating tokens that are already present in the generated text (`repetition_penalties_include_completion=true`) respectively the prompt (`repetition_penalties_include_prompt=true`). Presence penalty is independent of the number of occurrences. Increase the value to reduce the likelihood of repeating text. An operation like the following is applied: logits[t] -> logits[t] - 1 * penalty where `logits[t]` is the logits for any given token. Note that the formula is independent of the number of times that a token appears. frequency_penalty: type: number default: 0.0 nullable: true description: | The frequency penalty reduces the likelihood of generating tokens that are already present in the generated text (`repetition_penalties_include_completion=true`) respectively the prompt (`repetition_penalties_include_prompt=true`). If `repetition_penalties_include_prompt=True`, this also includes the tokens in the prompt. Frequency penalty is dependent on the number of occurrences of a token. An operation like the following is applied: logits[t] -> logits[t] - count[t] * penalty where `logits[t]` is the logits for any given token and `count[t]` is the number of times that token appears. sequence_penalty: type: number default: 0.0 description: | Increasing the sequence penalty reduces the likelihood of reproducing token sequences that already appear in the prompt (if repetition_penalties_include_prompt is True) and prior completion. sequence_penalty_min_length: type: integer default: 2 description: | Minimal number of tokens to be considered as sequence repetition_penalties_include_prompt: type: boolean default: false nullable: true description: Flag deciding whether presence penalty or frequency penalty are updated from tokens in the prompt repetition_penalties_include_completion: type: boolean default: true description: Flag deciding whether presence penalty or frequency penalty are updated from tokens in the completion use_multiplicative_presence_penalty: type: boolean default: false nullable: true description: Flag deciding whether presence penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for presence penalty. use_multiplicative_frequency_penalty: type: boolean default: false description: Flag deciding whether frequency penalty is applied multiplicatively (True) or additively (False). This changes the formula stated for frequency penalty. use_multiplicative_sequence_penalty: type: boolean default: false description: Flag deciding whether sequence penalty is applied multiplicatively (True) or additively (False). penalty_bias: type: string nullable: true default: null description: | All tokens in this text will be used in addition to the already penalized tokens for repetition penalties. These consist of the already generated completion tokens and the prompt tokens, if `repetition_penalties_include_prompt` is set to `true`. penalty_exceptions: type: array nullable: true items: type: string description: | List of strings that may be generated without penalty, regardless of other penalty settings. By default, we will also include any `stop_sequences` you have set, since completion performance can be degraded if expected stop sequences are penalized. You can disable this behavior by setting `penalty_exceptions_include_stop_sequences` to `false`. penalty_exceptions_include_stop_sequences: type: boolean default: true nullable: true description: | By default we include all `stop_sequences` in `penalty_exceptions`, so as not to penalise the presence of stop sequences that are present in few-shot prompts to give structure to your completions. You can set this to `false` if you do not want this behaviour. See the description of `penalty_exceptions` for more information on what `penalty_exceptions` are used for. best_of: type: integer nullable: true default: 1 maximum: 100 description: If a value is given, the number of `best_of` completions will be generated on the server side. The completion with the highest log probability per token is returned. If the parameter `n` is greater than 1 more than 1 (`n`) completions will be returned. `best_of` must be strictly greater than `n`. n: type: integer default: 1 nullable: true description: The number of completions to return. If argmax sampling is used (temperature, top_k, top_p are all default) the same completions will be produced. This parameter should only be increased if random sampling is used. logit_bias: type: object nullable: true log_probs: type: integer default: null nullable: true description: Number of top log probabilities for each token generated. Log probabilities can be used in downstream tasks or to assess the model's certainty when producing tokens. No log probabilities are returned if set to None. Log probabilities of generated tokens are returned if set to 0. Log probabilities of generated tokens and top n log probabilities are returned if set to n. stop_sequences: type: array nullable: true description: | List of strings that will stop generation if they're generated. Stop sequences may be helpful in structured texts. items: type: string tokens: type: boolean default: false nullable: true description: Flag indicating whether individual tokens of the completion should be returned (True) or whether solely the generated text (i.e. the completion) is sufficient (False). raw_completion: type: boolean default: false description: | Setting this parameter to true forces the raw completion of the model to be returned. For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the `CompletionResponse`. The raw completion, if returned, will contain the un-optimized completion. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned. disable_optimizations: type: boolean default: false nullable: true description: | We continually research optimal ways to work with our models. By default, we apply these optimizations to both your prompt and completion for you. Our goal is to improve your results while using our API. But you can always pass `disable_optimizations: true` and we will leave your prompt and completion untouched. completion_bias_inclusion: type: array items: type: string default: [] description: | Bias the completion to only generate options within this list; all other tokens are disregarded at sampling Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa completion_bias_inclusion_first_token_only: type: boolean default: false description: | Only consider the first token for the completion_bias_inclusion completion_bias_exclusion: type: array items: type: string default: [] description: | Bias the completion to NOT generate options within this list; all other tokens are unaffected in sampling Note that strings in the inclusion list must not be prefixes of strings in the exclusion list and vice versa completion_bias_exclusion_first_token_only: type: boolean default: false description: | Only consider the first token for the completion_bias_exclusion contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` required: - model - prompt - maximum_tokens CompletionResponse: type: object example: completions: [ { completion: "keeps the doctor away,", finish_reason: maximum_tokens, }, ] model_version: 2021-12 optimized_prompt: An apple a day num_tokens_prompt_total: 4 num_tokens_generated: 5 properties: model_version: type: string description: model name and version (if any) of the used model for inference completions: type: array description: list of completions; may contain only one entry if no more are requested (see parameter n) items: type: object properties: log_probs: type: object nullable: true description: list with a dictionary for each generated token. The dictionary maps the keys' tokens to the respective log probabilities. This field is only returned if requested with the parameter "log_probs". completion: type: string nullable: false description: generated completion on the basis of the prompt raw_completion: type: string nullable: true description: | For some models, we may optimize the completion that was generated by the model and return the optimized completion in the completion field of the CompletionResponse. The raw completion, if returned, will contain the un-optimized completion. Setting the parameter `raw_completion` in the CompletionRequest to true forces the raw completion of the model to be returned. Setting tokens to true or log_probs to any value will also trigger the raw completion to be returned. completion_tokens: type: array items: type: string description: completion split into tokens. This field is only returned if requested with the parameter "tokens". finish_reason: type: string nullable: true description: reason for termination of generation. This may be a stop sequence or maximum number of tokens reached. optimized_prompt: $ref: "#/components/schemas/OptimizedPrompt" num_tokens_prompt_total: type: integer description: | Number of tokens combined across all completion tasks. In particular, if you set best_of or n to a number larger than 1 then we report the combined prompt token count for all best_of or n tasks. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into a fixed amount of tokens that depends on the chosen model. num_tokens_generated: type: integer description: | Number of tokens combined across all completion tasks. If multiple completions are returned or best_of is set to a value greater than 1 then this value contains the combined generated token count. ExplanationRequest: type: object properties: model: type: string description: Name of the model to use. hosting: type: string nullable: true enum: ["aleph-alpha"] description: | Determines in which datacenters the request may be processed. You can either set the parameter to "aleph-alpha" or omit it (defaulting to None). Not setting this value, or setting it to None, gives us maximal flexibility in processing your request in our own datacenters and on servers hosted with other providers. Choose this option for maximal availability. Setting it to "aleph-alpha" allows us to only process the request in our own datacenters. Choose this option for maximal data privacy. prompt: $ref: "#/components/schemas/Prompt" target: type: string description: The completion string to be explained based on model probabilities. nullable: true control_factor: type: number default: 0.1 description: | Factor to apply to the given token in the attention matrix. - 0 <= factor < 1 => Suppress the given token - factor == 1 => identity operation, no change to attention - factor > 1 => Amplify the given token contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` postprocessing: type: string enum: ["none", "absolute", "square"] default: "none" description: | Optionally apply postprocessing to the difference in cross entropy scores for each token. "none": Apply no postprocessing. "absolute": Return the absolute value of each value. "square": Square each value normalize: type: boolean default: false description: | Return normalized scores. Minimum score becomes 0 and maximum score becomes 1. Applied after any postprocessing prompt_granularity: type: object properties: type: type: string enum: ["token", "word", "sentence", "paragraph", "custom"] description: | At which granularity should the target be explained in terms of the prompt. If you choose, for example, "sentence" then we report the importance score of each sentence in the prompt towards generating the target output. If you do not choose a granularity then we will try to find the granularity that brings you closest to around 30 explanations. For large documents, this would likely be sentences. For short prompts this might be individual words or even tokens. If you choose a custom granularity then you must provide a custom delimiter. We then split your prompt by that delimiter. This might be helpful if you are using few-shot prompts that contain stop sequences. For image prompt items, the granularities determine into how many tiles we divide the image for the explanation. "token" -> 12x12 "word" -> 6x6 "sentence" -> 3x3 "paragraph" -> 1 delimiter: type: string description: | A delimiter string to split the prompt on if "custom" granularity is chosen. target_granularity: type: string enum: ["complete", "token"] default: "complete" description: | How many explanations should be returned in the output. "complete" -> Return one explanation for the entire target. Helpful in many cases to determine which parts of the prompt contribute overall to the given completion. "token" -> Return one explanation for each token in the target. control_token_overlap: type: string enum: [partial, complete] default: partial description: | What to do if a control partially overlaps with a text or image token. If set to "partial", the factor will be adjusted proportionally with the amount of the token it overlaps. So a factor of 2.0 of a control that only covers 2 of 4 token characters, would be adjusted to 1.5. (It always moves closer to 1, since 1 is an identity operation for control factors.) If set to "complete", the full factor will be applied as long as the control overlaps with the token at all. required: - model - prompt - target TokenIdsPromptItemImportance: type: object description: | Explains the importance of a request prompt item of type "token_ids". Will contain one floating point importance value for each token in the same order as in the original prompt. properties: type: type: string enum: ["token_ids"] scores: type: array items: type: number TargetItemImportance: type: object description: | Explains the importance of text in the target string that came before the currently to-be-explained target token. The amount of items in the "scores" array depends on the granularity setting. Each score object contains an inclusive start character and a length of the substring plus a floating point score value. properties: type: type: string enum: ["target"] scores: type: array items: type: object properties: start: type: integer length: type: integer score: type: number TextPromptItemImportance: type: object description: | Explains the importance of a text prompt item. The amount of items in the "scores" array depends on the granularity setting. Each score object contains an inclusive start character and a length of the substring plus a floating point score value. properties: type: type: string enum: ["text"] scores: type: array items: type: object properties: start: type: integer length: type: integer score: type: number ImagePromptItemImportance: type: object description: | Explains the importance of an image prompt item. The amount of items in the "scores" array depends on the granularity setting. Each score object contains the top-left corner of a rectangular area in the image prompt. The coordinates are all between 0 and 1 in terms of the total image size properties: type: type: string enum: ["image"] scores: type: array items: type: object properties: rect: type: object properties: top: type: number left: type: number width: type: number height: type: number score: type: number ExplanationResponse: description: | The top-level response data structure that will be returned from an explanation request. type: object properties: model_version: type: string explanations: description: | This array will contain one explanation object for each token in the target string. type: array items: type: object properties: target: description: | The string representation of the target token which is being explained type: string items: description: | Contains one item for each prompt item (in order), and the last item refers to the target. type: array items: oneOf: - $ref: "#/components/schemas/TokenIdsPromptItemImportance" - $ref: "#/components/schemas/TargetItemImportance" - $ref: "#/components/schemas/TextPromptItemImportance" - $ref: "#/components/schemas/ImagePromptItemImportance" QARequest: type: object example: query: "Who likes Pizza?" documents: [ { "text": "Andreas likes Pizza." }, { "docx": "b64;base64EncodededWordDocument" }, ] properties: hosting: $ref: "#/components/schemas/Hosting" query: type: string description: The question to be answered about the prompt by the model. The prompt may not contain a valid answer. documents: type: array items: $ref: "#/components/schemas/Document" description: | A list of documents. Valid document formats for tasks like Q&A and Summarization. These can be one of the following formats: - Docx: A base64 encoded Docx file - Text: A string of text - Prompt: A multimodal prompt, as is used in our other tasks like Completion Docx and Text documents are usually preferred and have optimisations (such as chunking) applied to make them work better with the task being performed. Prompt documents are assumed to be used for advanced use cases, and will be left as-is. max_answers: type: integer default: 30 minimum: 1 maximum: 200 description: The maximum number of answers to return for this query. A smaller number of max answers can possibly return answers sooner, since less answers have to be generated. required: - query - documents QAResponse: type: object example: answers: [ { answer: Andreas, score: 0.9980973, evidence: "Andreas likes Pizza.", }, ] model_version: 2021-12 properties: model_version: type: string description: model name and version (if any) of the used model for inference answers: type: array description: list of answers. One answer per chunk. items: type: object properties: answer: type: string description: The answer generated by the model for a given chunk. score: type: number format: float description: quality score of the answer evidence: type: string description: The evidence from the source document for the given answer. required: - answer - score - evidence EmbeddingRequest: type: object example: model: luminous-base prompt: An apple a day keeps the doctor away. layers: [0, 1] tokens: false pooling: ["max"] type: "default" properties: model: type: string description: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version. hosting: $ref: "#/components/schemas/Hosting" prompt: $ref: "#/components/schemas/Prompt" layers: type: array items: type: integer description: | A list of layer indices from which to return embeddings. - Index 0 corresponds to the word embeddings used as input to the first transformer layer - Index 1 corresponds to the hidden state as output by the first transformer layer, index 2 to the output of the second layer etc. - Index -1 corresponds to the last transformer layer (not the language modelling head), index -2 to the second last tokens: type: boolean nullable: true description: Flag indicating whether the tokenized prompt is to be returned (True) or not (False) pooling: type: array items: type: string description: | Pooling operation to use. Pooling operations include: - mean: Aggregate token embeddings across the sequence dimension using an average. - weighted_mean: Position weighted mean across sequence dimension with latter tokens having a higher weight. - max: Aggregate token embeddings across the sequence dimension using a maximum. - last_token: Use the last token. - abs_max: Aggregate token embeddings across the sequence dimension using a maximum of absolute values. type: type: string nullable: true description: | Explicitly set embedding type to be passed to the model. This parameter was created to allow for semantic_embed embeddings and will be deprecated. Please use the semantic_embed-endpoint instead. normalize: type: boolean default: false description: | Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric. contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` required: - model - prompt EmbeddingResponse: type: object example: model_version: 2021-12 embeddings: { layer_0: { max: [ -0.053497314, 0.0053749084, 0.06427002, 0.05316162, -0.0044059753, ..., ], }, layer_1: { max: [ 0.14086914, -0.24780273, 1.3232422, -0.07055664, 1.2148438, ..., ], }, } tokens: null num_tokens_prompt_total: 42 properties: model_version: type: string description: model name and version (if any) of the used model for inference embeddings: type: object nullable: true description: | embeddings: - pooling: a dict with layer names as keys and and pooling output as values. A pooling output is a dict with pooling operation as key and a pooled embedding (list of floats) as values tokens: type: array items: type: string nullable: true num_tokens_prompt_total: type: integer description: | Number of tokens in the prompt. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into a fixed amount of tokens that depends on the chosen model. SemanticEmbeddingRequest: type: object example: model: luminous-base prompt: An apple a day keeps the doctor away. representation: "symmetric" compress_to_size: 128 properties: model: type: string description: Name of the model to use. A model name refers to a model's architecture (number of parameters among others). The most recent version of the model is always used. The model output contains information as to the model version. To create semantic embeddings, please use `luminous-base`. hosting: $ref: "#/components/schemas/Hosting" prompt: $ref: "#/components/schemas/Prompt" representation: type: string enum: ["symmetric", "document", "query"] description: | Type of embedding representation to embed the prompt with. `"symmetric"`: Symmetric embeddings assume that the text to be compared is interchangeable. Usage examples for symmetric embeddings are clustering, classification, anomaly detection or visualisation tasks. "symmetric" embeddings should be compared with other "symmetric" embeddings. `"document"` and `"query"`: Asymmetric embeddings assume that there is a difference between queries and documents. They are used together in use cases such as search where you want to compare shorter queries against larger documents. `"query"`-embeddings are optimized for shorter texts, such as questions or keywords. `"document"`-embeddings are optimized for larger pieces of text to compare queries against. compress_to_size: type: integer enum: [128] nullable: true description: | The default behavior is to return the full embedding with 5120 dimensions. With this parameter you can compress the returned embedding to 128 dimensions. The compression is expected to result in a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical. With the compressed embedding can also perform better if you are embedding really short texts or documents. normalize: type: boolean default: false description: | Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric. contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` required: - prompt - representation SemanticEmbeddingResponse: type: object example: model_version: 2021-12 embedding: [ -0.053497314, 0.0053749084, 0.06427002, 0.05316162, -0.0044059753, ..., ] num_tokens_prompt_total: 42 properties: model_version: type: string description: model name and version (if any) of the used model for inference embedding: type: array items: type: integer description: A list of floats that can be used to compare against other embeddings. num_tokens_prompt_total: type: integer description: | Number of tokens in the prompt. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into a fixed amount of tokens that depends on the chosen model. BatchSemanticEmbeddingRequest: type: object example: model: luminous-base prompt: An apple a day keeps the doctor away. representation: "symmetric" compress_to_size: 128 properties: model: type: string description: Name of the model to use. A model name refers to a model's architecture (number of parameters among others). The most recent version of the model is always used. The model output contains information as to the model version. To create semantic embeddings, please use `luminous-base`. hosting: $ref: "#/components/schemas/Hosting" prompts: type: array items: $ref: "#/components/schemas/Prompt" representation: type: string enum: ["symmetric", "document", "query"] description: | Type of embedding representation to embed the prompt with. `"symmetric"`: Symmetric embeddings assume that the text to be compared is interchangeable. Usage examples for symmetric embeddings are clustering, classification, anomaly detection or visualisation tasks. "symmetric" embeddings should be compared with other "symmetric" embeddings. `"document"` and `"query"`: Asymmetric embeddings assume that there is a difference between queries and documents. They are used together in use cases such as search where you want to compare shorter queries against larger documents. `"query"`-embeddings are optimized for shorter texts, such as questions or keywords. `"document"`-embeddings are optimized for larger pieces of text to compare queries against. compress_to_size: type: integer enum: [128] nullable: true description: | The default behavior is to return the full embedding with 5120 dimensions. With this parameter you can compress the returned embedding to 128 dimensions. The compression is expected to result in a small drop in accuracy performance (4-6%), with the benefit of being much smaller, which makes comparing these embeddings much faster for use cases where speed is critical. With the compressed embedding can also perform better if you are embedding really short texts or documents. normalize: type: boolean default: false description: | Return normalized embeddings. This can be used to save on additional compute when applying a cosine similarity metric. contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` required: - prompts - representation BatchSemanticEmbeddingResponse: type: object example: model_version: 2021-12 embeddings: [ [ -0.053497314, 0.0053749084, 0.06427002, 0.05316162, -0.0044059753, ..., ], [ -0.053497314, 0.0053749084, 0.06427002, 0.05316162, -0.0044059753, ..., ], ] num_tokens_prompt_total: 42 properties: model_version: type: string description: model name and version (if any) of the used model for inference embeddings: type: array items: type: array items: type: number description: A list of floats that can be used to compare against other embeddings. num_tokens_prompt_total: type: integer description: | Number of tokens in the all prompts combined. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into a fixed amount of tokens that depends on the chosen model. EvaluationRequest: type: object example: model: luminous-base prompt: An apple a day completion_expected: keeps the doctor away. properties: model: type: string description: Name of model to use. A model name refers to a model architecture (number of parameters among others). Always the latest version of model is used. The model output contains information as to the model version. hosting: $ref: "#/components/schemas/Hosting" prompt: $ref: "#/components/schemas/Prompt" completion_expected: type: string description: The completion that you would expect to be completed. Unconditional completion can be used with an empty string (default). The prompt may contain a zero shot or few shot task. contextual_control_threshold: type: number default: null nullable: true description: | If set to `null`, attention control parameters only apply to those tokens that have explicitly been set in the request. If set to a non-null value, we apply the control parameters to similar tokens as well. Controls that have been applied to one token will then be applied to all other tokens that have at least the similarity score defined by this parameter. The similarity score is the cosine similarity of token embeddings. control_log_additive: type: boolean default: true description: | `true`: apply controls on prompt items by adding the `log(control_factor)` to attention scores. `false`: apply controls on prompt items by `(attention_scores - -attention_scores.min(-1)) * control_factor` required: - model - prompt - completion_expected EvaluationResponse: type: object example: model_version: 2021-12 result: { log_probability: -1.2281955, log_perplexity: 1.2281955, log_perplexity_per_token: 0.24563909, log_perplexity_per_character: 1.2281955, correct_greedy: true, token_count: 5, character_count: 1, completion: " keeps the doctor away.", } num_tokens_prompt_total: 9 properties: model_version: type: string description: model name and version (if any) of the used model for inference result: type: object description: dictionary with result metrics of the evaluation properties: log_probability: type: number nullable: true description: log probability of producing the expected completion given the prompt. This metric refers to all tokens and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers. log_perplexity: type: number nullable: true description: log perplexity associated with the expected completion given the prompt. This metric refers to all tokens and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers. log_perplexity_per_token: type: number nullable: true description: log perplexity associated with the expected completion given the prompt normalized for the number of tokens. This metric computes an average per token and is therefore dependent on the used tokenizer. It cannot be directly compared among models with different tokenizers. log_perplexity_per_character: type: number nullable: true description: log perplexity associated with the expected completion given the prompt normalized for the number of characters. This metric is independent of any tokenizer. It can be directly compared among models with different tokenizers. correct_greedy: type: boolean nullable: true description: Flag indicating whether a greedy completion would have produced the expected completion. token_count: type: integer nullable: true description: Number of tokens in the expected completion. character_count: type: integer nullable: true description: Number of characters in the expected completion. completion: type: string nullable: true description: argmax completion given the input consisting of prompt and expected completion. This may be used as an indicator of what the model would have produced. As only one single forward is performed an incoherent text could be produced especially for long expected completions. num_tokens_prompt_total: type: integer description: | The sum over the number of tokens of both the `prompt` and the `completion_expected` fields. Tokenization: - Token ID arrays are used as as-is. - Text prompt items are tokenized using the tokenizers specific to the model. - Each image is converted into a fixed amount of tokens that depends on the chosen model. SummarizationRequest: type: object example: model: luminous-extended document: { "text": "Some people like pizza more than burgers. Other people don't. But we all love food.", } properties: hosting: $ref: "#/components/schemas/Hosting" document: $ref: "#/components/schemas/Document" required: - document SummarizationResponse: type: object example: summary: "All people love food" model_version: 2021-12 properties: model_version: type: string description: model name and version (if any) of the used model for inference summary: type: string description: Summary of the document. For longer documents, this may be a bulleted list of summaries for sections of the document. TokenizationRequest: type: object example: model: luminous-base prompt: An apple a day keeps the doctor away. tokens: true token_ids: true properties: model: type: string prompt: type: string tokens: type: boolean token_ids: type: boolean required: - model - prompt - tokens - token_ids TokenizationResponse: type: object example: tokens: [ "ĠAn", "Ġapple", "Ġa", "Ġday", "Ġkeeps", "Ġthe", "Ġdoctor", "Ġaway", ".", ] token_ids: [560, 34438, 246, 1966, 18075, 275, 8809, 3476, 17] properties: tokens: type: array items: type: string token_ids: type: array items: type: integer DetokenizationRequest: type: object example: model: luminous-base token_ids: [560, 34438, 246, 1966, 18075, 275, 8809, 3476, 17] properties: model: type: string token_ids: type: array items: type: integer required: - model - token_ids DetokenizationResponse: type: object example: result: " An apple a day keeps the doctor away." properties: result: type: string RecentRequestsResponse: type: array items: type: object properties: create_timestamp: type: string model_name: type: string request_type: type: string token_count_prompt: type: integer image_count_prompt: type: integer token_count_completion: type: integer duration_millis: type: integer credits: type: number UserDetail: type: object required: - id - email - role - credits_remaining - invoice_allowed - out_of_credits_threshold - terms_of_service_version properties: id: type: number description: User ID email: type: string description: Email address of the user role: type: string description: Role of the user token: type: string description: Legacy access token, will be `null` for new accounts and eventually deleted. deprecated: true credits_remaining: type: number description: Remaining credits for this user invoice_allowed: type: boolean description: Is this user post-paid? out_of_credits_threshold: type: integer description: Threshold for out-of-credits notification. If the threshold gets crossed with a task, then we trigger an email. terms_of_service_version: type: string description: Version string of the terms of service that the user has accepted UserChange: type: object properties: out_of_credits_threshold: type: integer description: Threshold for out-of-credits notification. If the threshold gets crossed with a task, then we trigger an email. Permissions: title: List of permissions type: array items: type: object properties: permission: type: string tags: - name: API description - name: tokens description: Manage tokens associated with your user account for API access. - name: models - name: tasks description: Requests for different types of tasks you can request with our models. paths: /version: get: summary: Current API version description: Will return the version number of the API that is deployed to this environment. operationId: version tags: - API description responses: "200": description: OK content: text/plain: schema: type: string example: 1.0.0 /users/me/tokens: get: summary: Get a list of issued API tokens description: | Will return a list of API tokens that are registered for this user (only token metadata is returned, not the actual tokens) operationId: tokens tags: - tokens security: - token: [] responses: "200": description: OK content: application/json: schema: type: array items: type: object properties: description: type: string example: "token used on my laptop" description: | A simple description that was supplied when creating the token token_id: type: integer description: | The token ID to use when calling other endpoints required: - description - token_id post: summary: Create a new API token description: | Create a new token to authenticate against the API with (the actual API token is only returned when calling this endpoint) operationId: newToken tags: - tokens security: - token: [] requestBody: required: true content: application/json: schema: type: object properties: description: type: string example: "token used on my laptop" description: | a simple description to remember the token by required: - description responses: "200": description: OK content: application/json: schema: type: object properties: metadata: type: object properties: description: type: string description: the description you provided token_id: type: number description: the ID of the API token required: - description - token_id token: type: string description: the API token that can be used in the Authorization header required: - metadata - token /users/me/tokens/{token_id}: parameters: - name: token_id schema: type: integer format: int32 in: path description: API token ID required: true delete: summary: Delete an API token operationId: deleteToken tags: - tokens security: - token: [] responses: "204": description: No Content /models/{modelName}/tokenizer: parameters: - name: modelName in: path description: Name of the model required: true schema: type: string get: operationId: getModelTokenizer summary: Gets the tokenizer that was used to train a model description: Returns a representation of the tokenizer that was used to train that model tags: - models - tokenizers security: - token: [] responses: "200": description: OK content: application/json: schema: type: object /models_available: get: summary: Currently available models. This interface is deprecated and will be removed in a later version. deprecated: true description: Will return all currently available models. operationId: availableModels tags: - models security: - token: [] responses: "200": description: OK content: application/json: schema: type: array items: properties: name: type: string description: type: string hostings: type: array items: type: string /complete: post: summary: Completion description: | Will complete a prompt using a specific model. To obtain a valid model, use `GET` `/models_available`. operationId: complete tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/CompletionRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/CompletionResponse" /embed: post: summary: Embeddings description: Embeds a text using a specific model. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use `GET` `/models_available`. operationId: embed tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/EmbeddingRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/EmbeddingResponse" /semantic_embed: post: summary: Semantic Embeddings description: Embeds a prompt using a specific model and semantic embedding method. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use `GET` `/models_available`. operationId: semanticEmbed tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/SemanticEmbeddingRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/SemanticEmbeddingResponse" /batch_semantic_embed: post: summary: Batched Semantic Embeddings description: Embeds multiple prompts using a specific model and semantic embedding method. Resulting vectors that can be used for downstream tasks (e.g. semantic similarity) and models (e.g. classifiers). To obtain a valid model, use `GET` `/models_available`. operationId: batchSemanticEmbed tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/BatchSemanticEmbeddingRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/BatchSemanticEmbeddingResponse" /evaluate: post: summary: Evaluate description: Evaluates the model's likelihood to produce a completion given a prompt. operationId: evaluate tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/EvaluationRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/EvaluationResponse" /explain: post: operationId: explain summary: Explanation description: | Better understand the source of a completion, specifically on how much each section of a prompt impacts each token of the completion. tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/ExplanationRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/ExplanationResponse" /tokenize: post: summary: Tokenize description: Tokenize a prompt for a specific model. To obtain a valid model, use `GET` `/models_available`. operationId: tokenize tags: - tasks security: - token: [] requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/TokenizationRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/TokenizationResponse" /detokenize: post: summary: Detokenize description: Detokenize a list of tokens into a string. To obtain a valid model, use `GET` `/models_available`. operationId: detokenize tags: - tasks security: - token: [] requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/DetokenizationRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/DetokenizationResponse" /qa: post: summary: Q&A description: | Will answer a question about text given in a prompt. This interface is deprecated and will be removed in a later version. New methodologies for processing Q&A tasks will be provided before this is removed. deprecated: true operationId: qa tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/QARequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/QAResponse" /summarize: post: summary: Summarize description: | Will summarize a document using a specific model. This interface is deprecated and will be removed in a later version. New methodologies for processing Summarization tasks will be provided before this is removed. deprecated: true operationId: summarize tags: - tasks security: - token: [] parameters: - in: query name: nice schema: type: boolean description: | Setting this to True, will signal to the API that you intend to be nice to other users by de-prioritizing your request below concurrent ones. requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/SummarizationRequest" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/SummarizationResponse" /openapi.yaml: get: summary: OpenAPI specification description: | Returns the latest OpenAPI specification for this API. tags: - API description responses: "200": description: OK content: text/yaml: schema: type: string /openapi-description: get: summary: Get available OpenAPI description versions description: | Returns the available versions of OpenAPI description for this API. tags: - API description responses: "200": description: OK content: application/json: schema: type: array items: type: string /openapi-description/{version}: parameters: - name: version schema: type: string in: path description: API version for which to get OpenAPI description required: true get: summary: Get specified OpenAPI description versions description: | Returns the request specific OpenAPI description for this API. tags: - API description responses: "200": description: OK content: text/yaml: schema: type: string /users/login: post: summary: Login with email and password. description: | This should not be used by most API consumers, who should generate API tokens using the login page and use bearer authentication. This endpoint allows bootstrapping the API via login to generate more API tokens. tags: - users requestBody: required: true content: application/json: schema: type: object properties: email: type: string description: the email address password: type: string description: the password required: - email - password responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/UserDetail" headers: Set-Cookie: schema: type: string /users/me/requests: get: summary: Query Recent Usage. This interface is deprecated and will be removed in a later version. deprecated: true description: | A list of the ten most recent tasks successfully completed by the API. Contains statistics about the task, including duration of execution and cost in credits. operationId: recentRequests tags: - tasks security: - token: [] responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/RecentRequestsResponse" /users/me: get: tags: - users summary: Get settings for own user description: | Returns details of this user. Can be called by a user which has access to this user id or by an admin for any user. security: - token: [] responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/UserDetail" patch: tags: - users summary: Change settings for own user description: | This route currently only supports changing the out_of_credits_threshold. security: - token: [] requestBody: required: true content: application/json: schema: $ref: "#/components/schemas/UserChange" responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/UserDetail" /check_privileges: post: operationId: postCheckPrivileges summary: Check a users privileges description: | Post an array of permissions as an authenticated users. We return the subset of the posted permissions that have been granted to the user. tags: - permissions security: - token: [] responses: "200": description: OK content: application/json: schema: $ref: "#/components/schemas/Permissions"