This repo contains code and instructions for reproducing experiments in the paper "Learning to Refine with Fine-Grained Natural Language Feedback". We propose a new method - Detect, Critique and Refine (DCR) for post-hoc editing document grounded summaries and making them more factual.
To run end to end editing with DCR you can run our code with the following command and arguments:
from run_end_to_end_refinement.dcr import DCR
document_instruction = '' # source document with the summarization instruction
initial_response = '' # initial response
model = "llama3-ft" # critique and refinement model: could be any HF model or GPT-4
dcr = DCR(cuda_id=0, model_name=model, path_to_minicheck="/home/mwadhwa/code/MiniCheck/",cache_dir="/data/users/mwadhwa/")
refinement = dcr.refine(source_text=document_instruction, initial_response=initial_response)
print(refinement)
Our fine-tuned feedback and refinement models are available on HuggingFace 🤗:
- Critique Model: Llama2-7b-Chat Fine-Tuned / Llama3-8b-Instruct Fine-Tuned
- Refinement Model: Llama2-7b-Chat Fine-Tune / Llama3-8b-Instruct Fine-Tuned
The fine-tuning data distilled from GPT-4 is available on HuggingFace: https://huggingface.co/datasets/wadhma/dcr_data
You need to setup the folloiwng:
- pip install -r requirements.txt
- Setup MiniCheck here
We use the following metrics for evaluation:
- AlignScore (here)
- GPT-4 Likert Score on a scale of 1-5
- GPT-4 pairwise score