Prompt Template – Llama2 par BackProp

L’article de référence pour le mien est le suivant : Llama 2 Prompt Template associé à ce notebook qui trouve sa source ici.

Il n’y a de prompt template que pour la version chat des modèles.

Another important point related to the data quality is the prompt template. Prompts are comprised of similar elements: system prompt (optional) to guide the model, user prompt (required) to give the instruction, additional inputs (optional) to take into consideration, and the model’s answer (required). In the case of Llama 2, the authors used the following template:
https://towardsdatascience.com/fine-tune-your-own-llama-2-model-in-a-colab-notebook-df9823a04a32

Le prompt template est expliqué par Meta dans la « doc » de Llama 2 :

The fine-tuned models were trained for dialogue applications. To get the expected features and performance for them, a specific formatting defined in chat_completion needs to be followed, including the INST and <<SYS>>tags, BOS and EOS tokens, and the whitespaces and breaklines in between (we recommend calling strip() on inputs to avoid double-spaces).
https://github.com/facebookresearch/llama#pretrained-models

Laconique et sibyllin comme souvent !

Je poursuis en lisant le code.

B_INST, E_INST = « [INST] », « [/INST] »
B_SYS, E_SYS = « <>\n », « \n<>\n\n »

DEFAULT_SYSTEM_PROMPT = « » »\
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don’t know the answer to a question, please don’t share false information. » » »

C’est un peu plus clair mais loin d’être explicite alors plutôt que de faire l’exégèse du code, j’ai choisi de me référer à un article écrit par quelqu’un qui a fait le job pour nous !

With regular newlines, e.g. for using with text-generation-webui:
https://gpus.llm-utils.org/llama-2-prompt-template/#fn:1

<s>[INST] <<SYS>>
System prompt
<</SYS>>

User prompt [/INST] Model answer </s>

Without a system message, it’d be like this:
https://gpus.llm-utils.org/llama-2-prompt-template/#fn:1

<s>[INST] {user_message_1} [/INST]

If you need newlines escaped, e.g. for using with curl or in the terminal:
https://gpus.llm-utils.org/llama-2-prompt-template/#fn:1

<s>[INST] <<SYS>>\n{your_system_message}\n<</SYS>>\n\n{user_message_1} [/INST]

Etant donné que Llama 2 a une petite mémoire… il est souvent nécessaire d’ajouter au prompt la réponse précédente de la conversation afin de donner le contexte.

To append model responses and continue a conversation, it should look like this
https://gpus.llm-utils.org/llama-2-prompt-template/#fn:1

<s>[INST] <<SYS>>\n{your_system_message}\n<</SYS>>\n\n{user_message_1} [/INST] {model_reply_1}</s><s>[INST] {user_message_2} [/INST]

With regular newlines, e.g. for using with text-generation-webui:
https://gpus.llm-utils.org/llama-2-prompt-template/#fn:1

<s>[INST] <<SYS>>
{your_system_message}
<</SYS>>

{user_message_1} [/INST] {model_reply_1}</s><s>[INST] {user_message_2} [/INST]

Pour vérifier que bos_token et eos_token ont bien la valeur </s> j’ai téléchargé le tokenizer de Llama 2 et j’ai lu le fichier special_tokens_map.json. Son contenu est le suivant :

{
  "bos_token": {
    "content": "<s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  },
  "unk_token": {
    "content": "<unk>",
    "lstrip": false,
    "normalized": false,
    "rstrip": false,
    "single_word": false
  }
}