Categories
Machine Learning

Fine Tuning and RAG Explained

sunrise-mesa
Sunrise at Mesa Arch in Canyonlands National Park near Moab, Utah, USA

As I continue my LLM learning journey, I have encountered terms such as fine tuning and retrieval augmented generation aka RAG. Initially the distinction between the two wasn’t clear to me. In this article I explain how they are different.

First, lets start with the less grandiose term Fine Tuning.

Fine tuning is the process of training a pre-trained (usually foundational model) in order to enhance its performance in a specific area or capability. For example, a foundational model may be fine tuned to specialize in English to Spanish translation. It requires model ownership and involves tweaking and adjusting of the model parameters. Thus fine tuning of the GPT models can only be done by Open AI, BART by Facebook and so on. Fine tuning consumes a lot of computational resources and is an expensive endeavor because performant LLMs include billions of parameters that have to be tweaked and the greater the number of parameters, the more computing power is needed.

Fine Tuning is an example of a technique known as Transfer Learning.

Transfer Learning involves re-using a pre-trained model created to solve a problem to solve a different problem. In Transfer Learning the knowledge gained for problem A is used to help solve problem B. For example, a classifier that was trained to recognize cars could be used to identify other objects such as traffic lights.

  • Need for optimal performance on specific rather than general tasks e.g. translation of english to spanish
  • Limited data availability
  • Less expensive and more efficient than training a model from scratch
  • Data security and compliance
    • for sensitive data, companies can fine tune local versions of models within their own infrastructure.

Fine tuning uses task and domain specific training data to train the model.

The steps involved in fine tuning are as follows:

  • Identify an appropriate pre-trained LLM
  • Gather the relevant dataset – split it into training, validation and test sets
  • Make a copy of a pre-trained model using all of its layers except the output layer to create a target model.
    • involves intializing the LLM with its pre-trained weights.
  • Add a new task-specific output later to the target model, making sure that the number of outputs match the desired classes in the target dataset.
  • Randomly set the values of the model parameters in the output layer.
  • Train the target model on the dataset e.g. English to Spanish data This step can be broken down into these smaller subtasks:
    • Updating model weights using back propagation and gradient descent
    • Hyper-parameter tuning – learning rate, batch size, regularization strength are adjusted
    • Validation – Model performance assessed using validation dataset and adjustments made to avoid over or underfitting
    • Testing – Evaluate model on test dataset to evaluate performance on unseen data

See this article for an implementation of Fine Tuning

We now turn our attention to RAG – Retrieval Augmented Generation.

Retrieval Augmented Generation is a mechanism that can be applied to pre-trained large language models to provide better answers by utilizing information stored in an external knowledge base.

In this process, a user submits a query to the model API. Behind the scenes a search is conducted to retrieve documents that may pertain information relevant to the query on a semantic basis, and this information is appended to the query in the form of a prompt before being submitted to the LLM. In this way, the quality of the response from the LLM can be greatly enhanced.

There are many reasons and use-cases where RAG is helpful in using LLMs.

  • Pre-trained foundation models do not take into account content that is recent or was not part of the original corpus of documents they were trained on.
  • Fine-tuning a pre-trained model is time consuming and can be expensive.
  • RAG offers a way to verify the output since the external data that is used is readily available. In a fine-tuned or pre-trained model this is not the case.
  • Using RAG reduces the occurrences of hallucinations.

RAG Workflow

The RAG workflow may be summarized as follows:

  1. User submits query to Chat API/LLM Interface
  2. Language model converts query to vector embeddings to be used in Search
  3. Search for and retrieve relevant documents (context) from vector store or via vector library
  4. Appends context to query in form of prompt and submits to LLM
  5. Language model outputs responses

RAG Use Cases

  • Customer support – Companies can provide customer support to external customers via a chatbot/chat api. They can also provide information internally to employees using their corporate data as the knowledge base.
  • Real-time user chatbot. RAG can be used to enhance the capabilities of chatbots to provide up-to-date sports, weather and stock market information
  1. Fine tuning is often a more expensive process from a time and resource perspective as opposed to RAG.
  2. Fine tuning involves training a pre-trained foundational model on specific tasks, and tweaking the modle parameters to do so. RAG on the other hand, focuses on connecting the LLM to external sources and using the context retrieved to enhance the capabilities of the LLM to better respond to user queries. RAG doesn’t make any adjustments to the underlying model.
  3. Fine tuned models are more prone to hallucinations when faced with unfamiliar inputs. RAG systems tend to hallucinate less because they ground their responses based on retrieved context data, and the responses can be validated against such data.
  4. Fine tuned models can behave like a black box, while RAG systems are more transparent and interpretable.
  5. Fine tuning requires more specialized expertise compared to implementing RAG.

In this article I have expounded on these 2 methods of enhancing the power of kang-language models – fine tuning and retrieval-augmented-generation, highlighted their differences and similarities and when to use one over the other.