open-r1

Autor	SHA1	Mensagem	Data
Edward Beeching	715c8787fb	add back grad accumulations steps (#612 )	2025-04-17 16:41:39 +02:00
lewtun	4c9b0f25d9	Fix TP once and for all :) (#613 ) * Update evaluation.py * Fix import	2025-04-17 15:25:59 +02:00
lewtun	14a81d2bd4	Update evaluation.py (#611 )	2025-04-17 11:11:49 +02:00
lewtun	5112bfc401	Fix SFT for base models (#604 ) * Fix pad token bug in SFT * Add ChatML default * Clean up * Refactor grpo model load * Add doc * Bump deepspeed	2025-04-16 11:45:50 +02:00
lewtun	bcbb1da401	Update evaluation.py (#608 )	2025-04-16 10:37:38 +02:00
lewtun	8eb1b7860a	Set DP=1 due to vLLM <> LightEval hanging (#600 ) * Update evaluate.slurm * Disable DP * Fix	2025-04-16 10:24:33 +02:00
lewtun	8cf42663fd	Clean up recipes (#596 )	2025-04-11 20:09:15 +02:00
Edward Beeching	068f13f236	Hotfix bin reward (#597 ) * add WIP code GRPO configs * hotfix bin reward * remove unwanted files * remote configs	2025-04-11 17:45:38 +02:00
lewtun	04dbf21989	Bump TRL and vLLM (#595 ) * Bump TRL and vLLM * Fix style * Bump liger * Add liger	2025-04-11 16:32:33 +02:00
Edward Beeching	c1eadaa097	E2B Router bug fixes (#592 ) * fix eval system prompt * style * fix a rare issue where the execution is None * fixes a bug in the e2b router	2025-04-11 14:04:59 +02:00
Edward Beeching	3a0e89678c	Fix eval system prompt (#591 ) * fix eval system prompt * style	2025-04-11 11:23:06 +02:00
Shenghang Tsai	2a7bb45f05	Update README.md (#590 )	2025-04-10 13:11:35 +02:00
lewtun	bf08f56849	[WIP] Bump lighteval with proper pass@1 (#584 ) * Bump lighteval with proper pass@1 * Bump lighteval * Update AIME24	2025-04-08 20:53:34 +02:00
Edward Beeching	1b3bf043dc	Adds a E2B router server that executes batches of scripts (#561 ) * adds a dedicated e2b server to handle batches of requests * fix reward tests * update slow reward * style * updates e2b router to be more generic * refactor * refactoring * licence, cleanup * update tests * style * fix import when e2b not present * style * rename sandbox file * rename to RoutedSandbox * update readme * nits * nits2 * unlimited max time * update logs path	2025-04-07 21:01:06 +02:00
lewtun	2636a2130f	Add WandB groups to logging (#573 )	2025-04-02 15:48:59 +02:00
lewtun	ca8664df1c	Fix missing prompt columns in recipes (#574 )	2025-04-02 15:48:48 +02:00
lewtun	4f5b21e21d	Fix accuracy reward for math (#566 ) * Fix accuracy reward for math * Add typing * Add unit test * Return None for invalid samples * Fix order of answers * Fix type * Use None for non-verifiable answers	2025-04-01 12:04:26 +02:00
Edward Beeching	9915e06f1e	Async code reward fixes (#546 ) * expose num parallel code executions * add e2b benchmarking script * adds new parallel code execution with better execption handling * style * update default * increase sandbox timeout * Add pretty table and Sandbox IDs * Add Sandbox ID * fix merge --------- Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>	2025-03-28 14:08:15 +01:00
Zhou Shao	1802bec75f	fix dataset parsing error (#540 ) * fix dataset parsing error support defined question field to fix errors when datasets' question field is not 'problem' * add question field config add script_args: question field * refactor: datasets prompt column --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-03-28 13:17:04 +01:00
lewtun	4ec555b0c8	Restore single-node instructions to run GRPO (#549 )	2025-03-27 10:29:07 +01:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
Edward Beeching	86f9471f8e	Fixes missing exception in run_script (#532 ) * adds binary code reward, refactors grpo with get_reward_funcs * adds return type to the function * fix exception in run_script causes batch of rewards to be zero * style	2025-03-24 13:36:48 +01:00
Zhou Shao	9409dca675	fix get_reward_funcs bug (#535 ) change the input from `script_args.reward_funcs` to `script_args`	2025-03-22 15:33:21 +01:00
Edward Beeching	af487204ca	Adds binary code reward (#528 ) * adds binary code reward, refactors grpo with get_reward_funcs * adds return type to the function * add get_reward_funcs test * remote type hint * move script args to another file * update test	2025-03-21 12:53:38 +01:00
Guilherme Penedo	7835979801	adds support for running GRPO on IOI problems (#495 ) * adds support for running GRPO on IOI problems * nit * bugfixes + recipe * added piston info and readme changes * readme updates * run isort to fix checks * Update src/open_r1/rewards.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * adding ioi test * fix merge issues with python slow tests * style * generalize piston workers * generalize readme * fix extract code * finalize slow tests --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-21 08:48:00 +01:00
koskotheim	d436b7b9c0	fix typo (#507 )	2025-03-15 20:56:14 +01:00
Edward Beeching	5dcfae8979	Fixes bug with async code reward (#504 ) * adds slow test for code reward * fixes bug in setting language and the output parsing * style * removed redundant comment * removed exeception as e * remove rewards * removed whitespace * more whitespace * remove need for loop with asyncio.run * nits * fix type error with e2n AsyncSandbox	2025-03-13 22:54:15 +01:00
lewtun	d5922af8ce	Add OlympicCoder recipes (#505 ) * Add OlympicCoder recipes * Fix configs * Add FSDP config	2025-03-13 19:08:34 +01:00
Agus	9890a8d992	Run e2b async sandbox by default (#484 ) * Run e2b async sandbox by default * Remove unnecessary rewards * Fix run_sync variable * Run linters * Let only async version * Remove unused Sandbox --------- Co-authored-by: agus <agustin.piqueres@huggingface.com>	2025-03-08 15:41:15 +01:00
A-transformer	88a1b002c1	missing ' (#479 ) missing '	2025-03-07 15:44:15 +01:00
A-transformer	6660a477ec	Remove unimplemented 'format_deepseek' from reward_funcs documentation (#480 ) Removed 'format_deepseek' from GRPOScriptArguments help string as it is not implemented in REWARD_FUNCS_REGISTRY and format_reward already covers DeepSeek-style formatting needs.	2025-03-06 10:45:50 +01:00
lewtun	3b5d6603bf	Add citation and acknowledgements (#481 ) * Update README.md * Update README.md * Update README.md	2025-03-05 20:23:57 +01:00
lewtun	a465641ec7	Fix make evaluate (#470 )	2025-03-04 14:25:58 +01:00
lewtun	299446902d	Enable decontamination on dataset configs (#460 )	2025-03-04 09:22:01 +01:00
lewtun	44cb13d4ba	Fix vLLM (#464 )	2025-03-03 17:25:30 +01:00
A-transformer	4b4c377f27	fix typo (#459 ) fix typo	2025-03-03 15:33:23 +01:00
lewtun	45ccf60109	Remove dataset_configs from YAML recipes (#461 )	2025-03-03 13:54:58 +01:00
Marco Z	c7733d3fa4	update makefile and readme (#449 ) Co-authored-by: Marco Zocca <marco.zocca@unfoldml.com>	2025-03-01 15:08:30 +01:00
Edward Beeching	8782fa6e90	bump lighteval, expose the lcb_v4 benchmark (#441 )	2025-02-26 17:59:44 +01:00
Edward Beeching	a20666d5b5	Bumps TRL (#437 )	2025-02-26 10:35:50 +01:00
elie	3ba56c1c3d	Add config sft smollm (#425 ) * add sft recipe * add smollm sft * max_length modif 1 * max_length modif 2	2025-02-25 21:45:59 +01:00
Nile Zhou	d036a1b341	fix reward verify err (#430 )	2025-02-25 16:15:00 +01:00
Edward Beeching	11beb9a4dc	Updates evals to run with ddp=8 for small models (#428 ) Currently the logic for calculating num_gpus considers eval in the TP setting, for the Qwen 7b models this retuns 4. However for smaller models we can use DDP and fix the num_gpus at 8	2025-02-25 16:00:13 +01:00
Agus	7188001281	Add script to decontaminate datasets against benchmark datasets (#416 ) * Add script to decontaminate datasets against benchmark datasets * Add docs for the decontamination script * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Add license header and attribution to the authors --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-24 19:54:44 +01:00
Edward Beeching	0c3ef8372e	updates max_seq_length to max length due to a bug in trl (#419 )	2025-02-24 17:27:56 +01:00
lewtun	566cfd1a44	Align format reward with R1 traces and add reward function to count think / answer tags (#418 ) * Fix tests * Tune * Add reward * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-02-24 17:16:40 +01:00
elie	5355687e6c	add sft recipe (#415 )	2025-02-24 15:43:12 +01:00
lewtun	3f9d75a595	Bump Liger kernel (#399 ) Needed to enable SFT training via https://github.com/huggingface/trl/pull/2874	2025-02-23 17:44:03 +01:00
lewtun	eeca246b07	Update prompt template and sampling parameters for evaluation (#392 ) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables	2025-02-22 15:21:01 +01:00
lewtun	49d9b741a5	Pin dependencies (#393 )	2025-02-22 14:46:09 +01:00

1 2 3 4

170 Commits