OpenBezoar: A Family of Small, Cost-Effective, and Open Source Artificial Intelligence Models Trained on Mixed Instruction Data

Written By Adarsh Shankar Jha

Prompts

The recent success of fine-tuning the teaching of pre-trained Large Language Models (LLMs) for later tasks has attracted considerable interest in the Artificial Intelligence (AI) community. This is because it allows models to align with human tastes. In order to ensure that these refined models adequately represent human preferences, methods such as Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF) have been developed.

In Supervised Fine-Tuning (SFT), instructions are provided to pre-trained LLMs, allowing them to be tailored to perform specific tasks. This not only guarantees that they produce logical answers, but also shows how supervised learning allows these models to efficiently adapt to different tasks through observational learning.

Due to the sheer size of the most sophisticated LLMs, which have over 100 billion parameters, many businesses and individuals cannot afford the computational cost of SFT. Studies have shown that models with fewer parameters can perform well in some cases and even outperform larger models. Traditionally, datasets containing a large number of human-generated examples are used for fine-tuning, which increases the adaptability of the final models. However, creating these databases is expensive and time-consuming. Also, commercial use of models trained on these datasets is often limited by licensing requirements.

In a recent study, a team of researchers from Surge Global created command-response pairs using open source command models that were licensed for commercial use in order to overcome these limitations. Three methods for generating data sets have been developed, producing order data sets that can be used for profit.

A human proxy model has been used to further refine these datasets in terms of quality and diversity. Using QLoRA, SFT was applied to the selected baseline model, yielding three adapter models. The OpenBezoar family of models consists of these models and an alignment-specific model.

The goal of this work is to develop the OpenBezoar family of models by optimizing the basic OpenLLaMA 3Bv2 model. There are several steps in the process, which are as follows:

Data generation: An open, commercially available, version of the Falcon-40B model with command detail has been used to generate synthetic command coordination data. LaMini-LM, WizardLM/Evol-Instruct (using data bricks-dolly-15k as seed dataset) and Orca (using Flan collection as seed dataset) are the three separate techniques that have been used to generate the data.

Data Filtering: To ensure quality and relevance, generated data is filtered using GPT-4, a human proxy.

Supervised fine-tuning: Each scheme undergoes a sequential QLoRA-based supervised fine-tuning process in which model parameters are changed to improve performance on specific tasks.

Minimization of distribution drift: To ensure that the model performs well on a variety of datasets, the supervised checkpoint is further refined using a subset of the HH-RLHF dataset.

Direct Preference Optimization (DPO): Applying the DPO loss function gives the last control point, “OpenBezoar-HH-RLHF-DPO”. In this step, the model is directly aligned with human preferences, negating the need for an additional reward model.

The team shared that the ‘LLM-as-a-judge’ framework with Claude 2.1 and LM Eval Harness tasks have been used to evaluate the endpoint on MT-Bench. The results showed that the “OpenBezoar-HH-RLHF-DPO” checkpoint outperforms many models on the 3B parameter scale. Beats the top model on the Huggingface Open LLM Leaderboard in one of the categories.

OpenBezoar-SFT, OpenBezoar-HH-RLHF-SFT and OpenBezoarHH-RLHF-DPO checkpoints have been released and are accessible on HuggingFace

check it Paper, Datasets for HFand Code base. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us Twitter. Join us Telegram channel, Discord Channeland LinkedIn Groops.

If you like our work, you will love our work newsletter..

Don’t forget to join us 40k+ ML SubReddits

Tanya Malhotra is a senior from University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with specialization in Artificial Intelligence and Machine Learning.
He is a Data Science enthusiast with good analytical and critical thinking along with a keen interest in acquiring new skills, leading teams and managing work in an organized manner.