Language Breakdown
Lines of code distribution across 7 owned repositories
I-Shaped Developer
I-shapedSpecialist — deep expertise in Python
Collaboration Network
Global Impact visualization
Repos
13
PRs
0
Growth
+18%
Top Collaborators
No collaborator data yet.
Coding Streak
Contribution activity over the past year
Top Repositories
We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference modeling and alignment problem.
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reaches 69% accuracy on MMMU.
Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning
In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K.
A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.
first practice
Open Source Impact
Contributions to external projects
No external contributions found.