RefGPT: Revolutionizing Synthetic Dialogue Generation with Truthfulness and High Customization

In a recent study, researchers from Shanghai Jiao Tong University, Hong Kong Polytechnic University, Beijing University of Posts and Telecommunications, and other institutions have developed a groundbreaking method, RefGPT, for generating truthful and customized dialogues using large language models, like GPT-3.5 and GPT-4. This new approach can efficiently create enormous dialogue datasets with minimal hallucination and added detailed controls to achieve high customization capabilities.

Overcoming Hallucinations in Dialogue Generation

Large language models (LLMs) have become indispensable for many natural language processing (NLP) tasks. While these models are powerful, they rely on high-quality instruction data to achieve fine-tuning, which is often expensive and hard to obtain, especially for multiturn dialogues. Additionally, the synthetic dialogues generated by LLMs like ChatGPT often suffer from untruthful content creation, a phenomenon known as model hallucination.

RefGPT addresses this challenge by providing a reference, a plain text or document, to guide the LLM in generating dialogues. By sticking to the provided reference, the model can generate truthful and authentic dialogues, overcoming model hallucination.

Highly Customizable Dialogues

The Reference Selection

A crucial aspect of generating truthful dialogues is selecting the right reference. RefGPT encourages the use of high-quality knowledge sources like Wikipedia as the reference, ensuring that generated dialogues have accurate information and adhere to the correct theme. The versatility of RefGPT allows for dialogues in various domains, including factual knowledge, program codes, and vertical domains like shopping apps or nuclear industry.

Basic Prompt and Dialogue Settings

To achieve customization, RefGPT provides the Basic Prompt, which can be tailored to guide the LLM towards generating specific responses. Key domains and slots within the Basic Prompt can be customized by users according to their needs.

For even more control over the LLM’s behavior, RefGPT introduces Dialogue Settings. These settings enable users to adjust different aspects of the model’s behavior, including repetition, creativity, specificity, and informativity, by turning specific features on or off. This combination of reference selection, Basic Prompt, and Dialogue Settings makes generating highly truthful and customized dialogues possible.

The Future of AI-driven Dialogues

The research behind RefGPT has enabled the generation of two new multi-turn dialogue datasets with domain-specific content, RefGPT-Fact and RefGPT-Code. These datasets showcase the potential of RefGPT to create high-quality, truthful, and customized dialogues. This is particularly significant for LLM training in specialized domains like factual question-answering and programming tasks.

By increasing truthfulness and customization, RefGPT holds the potential to significantly improve AI chatbot capabilities and streamline dialogue-based applications across various software systems. These advancements are an essential step in making AI-driven dialogues more reliable and user-friendly.

In conclusion, the RefGPT method provides a significant improvement over existing dialogue generation techniques by minimizing hallucinations and allowing high customization. This research demonstrates that AI can be guided to produce highly accurate and tailor-made dialogues by leveraging high-quality references. As AI continues to evolve, methods like RefGPT contribute to the enhancement of AI-driven communication, making it increasingly reliable for users across various domains.

Original Paper

RefGPT: Reference → Truthful & Customized Dialogues Generation by GPTs and for GPTs