writing.exchange is one of the many independent Mastodon servers you can use to participate in the fediverse.
A small, intentional community for poets, authors, and every kind of writer.

Administered by:

Server stats:

335
active users

#rl

4 posts4 participants0 posts today
Ukraine War Bulletins and News<p><a href="https://youtu.be/lcMZIG16l_k" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">youtu.be/lcMZIG16l_k</span><span class="invisible"></span></a><br>⛔️🇺🇸What Trump and Elon Musk’s DOGE’s shutdown of Radio Free Europe means for free speech | Focus on Europe (DW - German News in English VIDEO) <a href="https://mastodon.online/tags/Ukraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Ukraine</span></a> <a href="https://mastodon.online/tags/Mastodon" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Mastodon</span></a> <a href="https://mastodon.online/tags/BoycottTesla" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BoycottTesla</span></a> <a href="https://mastodon.online/tags/BoycottMusk" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BoycottMusk</span></a> <a href="https://mastodon.online/tags/BoycottX" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>BoycottX</span></a> <a href="https://mastodon.online/tags/Musk" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Musk</span></a> <a href="https://mastodon.online/tags/ElonMusk" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ElonMusk</span></a> <a href="https://mastodon.online/tags/Tesla" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Tesla</span></a> <a href="https://mastodon.online/tags/RFE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFE</span></a> <a href="https://mastodon.online/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://mastodon.online/tags/OSCE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>OSCE</span></a> <a href="https://mastodon.online/tags/PACE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>PACE</span></a> <a href="https://mastodon.online/tags/Germany" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Germany</span></a> <a href="https://mastodon.online/tags/France" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>France</span></a> <a href="https://mastodon.online/tags/NukesForUkraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NukesForUkraine</span></a> <a href="https://mastodon.online/tags/SouthKorea" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SouthKorea</span></a> <a href="https://mastodon.online/tags/Japan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Japan</span></a> <a href="https://mastodon.online/tags/Press" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Press</span></a> <a href="https://mastodon.online/tags/Taiwan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Taiwan</span></a> <a href="https://mastodon.online/tags/Media" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Media</span></a> <a href="https://mastodon.online/tags/NukesOrNATO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NukesOrNATO</span></a> <a href="https://mastodon.online/tags/USA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>USA</span></a> <a href="https://mastodon.online/tags/US" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>US</span></a> <a href="https://mastodon.online/tags/UK" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UK</span></a> <a href="https://mastodon.online/tags/EU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EU</span></a> <a href="https://mastodon.online/tags/NATO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NATO</span></a> <a href="https://mastodon.online/tags/News" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>News</span></a> <a href="https://mastodon.online/tags/UnitedStates" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UnitedStates</span></a> <a href="https://mastodon.online/tags/EuropeanUnion" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EuropeanUnion</span></a> <a href="https://mastodon.online/tags/UnitedKingdom" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UnitedKingdom</span></a> <a href="https://mastodon.online/tags/russiaUkraineWar" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>russiaUkraineWar</span></a> <a href="https://mastodon.online/tags/11yrInvasionofUkraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>11yrInvasionofUkraine</span></a> <a href="https://mastodon.online/tags/RussiaIsATerroristState" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RussiaIsATerroristState</span></a></p>
Minh Trinh<p>A talk on Reinforcement Learning in Reasoning Models <a href="https://www.youtube.com/watch?v=W47jVRQ67Wc" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">youtube.com/watch?v=W47jVRQ67Wc</span><span class="invisible"></span></a></p><p><a href="https://mastodon.online/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://mastodon.online/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://mastodon.online/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.online/tags/artificialintelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>artificialintelligence</span></a></p>
Yappari<p><span class="h-card" translate="no"><a href="https://journa.host/@w7voa" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>w7voa</span></a></span> </p><p>⬆️⬆️⬆️ Here is the <a href="https://universeodon.com/tags/Trump" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Trump</span></a> administration following court orders regarding funding of <a href="https://universeodon.com/tags/RadioFreeEurope" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RadioFreeEurope</span></a> <a href="https://universeodon.com/tags/RFE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFE</span></a> <a href="https://universeodon.com/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a></p><p><a href="https://universeodon.com/tags/RuleOfLaw" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RuleOfLaw</span></a> <a href="https://universeodon.com/tags/fedilaw" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fedilaw</span></a></p>
JNSLCT<p><a href="https://mastodon.social/tags/SiliconCurtain" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SiliconCurtain</span></a> </p><p>Leyla Latypova - RL &amp; RFE Closure Silences Indigenous Voices Struggling Against Russian Imperialism.</p><p><a href="https://www.youtube.com/watch?v=KdMawwzGG08" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">youtube.com/watch?v=KdMawwzGG08</span><span class="invisible"></span></a></p><p>Apart from destroying democracy <a href="https://mastodon.social/tags/trump" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>trump</span></a> and <a href="https://mastodon.social/tags/musk" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>musk</span></a> behave like <a href="https://mastodon.social/tags/ISIS" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ISIS</span></a> destroying historical statues (the archives).</p><p><a href="https://mastodon.social/tags/LeylaLatypova" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LeylaLatypova</span></a> <a href="https://mastodon.social/tags/Latypova" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Latypova</span></a> <a href="https://mastodon.social/tags/Tatarstan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Tatarstan</span></a> <a href="https://mastodon.social/tags/Bashkortostan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Bashkortostan</span></a> <a href="https://mastodon.social/tags/Chechnya" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Chechnya</span></a> <br><a href="https://mastodon.social/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://mastodon.social/tags/RadioLiberty" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RadioLiberty</span></a> <a href="https://mastodon.social/tags/RFE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFE</span></a> <a href="https://mastodon.social/tags/RadioFreeEurope" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RadioFreeEurope</span></a> <br><a href="https://mastodon.social/tags/ruSSia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ruSSia</span></a> <a href="https://mastodon.social/tags/racist" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>racist</span></a> <a href="https://mastodon.social/tags/fascist" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>fascist</span></a> <a href="https://mastodon.social/tags/kleptocrat" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>kleptocrat</span></a> <a href="https://mastodon.social/tags/maffia" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>maffia</span></a> <a href="https://mastodon.social/tags/terror" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>terror</span></a> </p><p><a href="https://mastodon.social/tags/Ukraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Ukraine</span></a> <a href="https://mastodon.social/tags/UkraineWar" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UkraineWar</span></a> <a href="https://mastodon.social/tags/StandWithUkraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>StandWithUkraine</span></a></p>
FunGuy2PlayWith<p>2+2=5<br>"America destroys one of its own symbols," by <a href="https://mastodon.online/tags/StanislavAseyev" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>StanislavAseyev</span></a> <br><a href="https://mastodon.online/tags/TimothySnyder" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>TimothySnyder</span></a> <a href="https://mastodon.online/tags/Trump" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Trump</span></a> <a href="https://mastodon.online/tags/RFE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFE</span></a> <a href="https://mastodon.online/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> </p><p><a href="https://open.substack.com/pub/snyder/p/225?r=6jfho&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">open.substack.com/pub/snyder/p</span><span class="invisible">/225?r=6jfho&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false</span></a></p>
Ukraine War Bulletins and News<p><a href="https://youtu.be/0HN3IfmOEg8" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">youtu.be/0HN3IfmOEg8</span><span class="invisible"></span></a><br>🔴🇺🇸EU searches for help after Trump cuts hit Radio Free Europe (Reuters News VIDEO) <a href="https://mastodon.online/tags/Ukraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Ukraine</span></a> <a href="https://mastodon.online/tags/Mastodon" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Mastodon</span></a> <a href="https://mastodon.online/tags/RFE" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFE</span></a> <a href="https://mastodon.online/tags/RFA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RFA</span></a> <a href="https://mastodon.online/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://mastodon.online/tags/NukesForUkraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NukesForUkraine</span></a> <a href="https://mastodon.online/tags/SouthKorea" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>SouthKorea</span></a> <a href="https://mastodon.online/tags/Press" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Press</span></a> <a href="https://mastodon.online/tags/News" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>News</span></a> <a href="https://mastodon.online/tags/Taiwan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Taiwan</span></a> <a href="https://mastodon.online/tags/Media" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Media</span></a> <a href="https://mastodon.online/tags/Japan" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Japan</span></a> <a href="https://mastodon.online/tags/NukesOrNATO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NukesOrNATO</span></a> <a href="https://mastodon.online/tags/USA" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>USA</span></a> <a href="https://mastodon.online/tags/US" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>US</span></a> <a href="https://mastodon.online/tags/UK" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UK</span></a> <a href="https://mastodon.online/tags/EU" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EU</span></a> <a href="https://mastodon.online/tags/NATO" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>NATO</span></a> <a href="https://mastodon.online/tags/UnitedStates" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UnitedStates</span></a> <a href="https://mastodon.online/tags/UnitedKingdom" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>UnitedKingdom</span></a> <br><a href="https://mastodon.online/tags/EuropeanUnion" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>EuropeanUnion</span></a> <a href="https://mastodon.online/tags/russiaUkraineWar" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>russiaUkraineWar</span></a> <br><a href="https://mastodon.online/tags/11yrInvasionOfUkraine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>11yrInvasionOfUkraine</span></a><br><a href="https://mastodon.online/tags/RussiaIsATerroristState" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RussiaIsATerroristState</span></a></p>
The Kyiv Independent [unofficial]<p><strong>Ukraine Daily summary - Sunday, March 16 2025</strong></p> Russia readying to attack Sumy as Donbas front stabilizes -- 'Putin is lying to everyone' — Zelensky calls for 'strong pressure' on Russia after UK summit -- Duda denounces Russia for 'imperial greed,' reiterates calls to deploy US nuclear weapons in Poland -- Russia attacks Ukraine with 178 drones overnight, targets energy infrastructure -- and more <p><a href="https://writeworks.uk/~/UkraineDaily/Ukraine%20Daily%20summary%20-%20%20Sunday,%20March%2016%202025/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">writeworks.uk/~/UkraineDaily/U</span><span class="invisible">kraine%20Daily%20summary%20-%20%20Sunday,%20March%2016%202025/</span></a></p>
Jellyman<p><a href="https://mastodon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a> <a href="https://mastodon.social/tags/rocketleague" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rocketleague</span></a> <a href="https://mastodon.social/tags/gaming" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gaming</span></a> <a href="https://mastodon.social/tags/Jellyman" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Jellyman</span></a> <a href="https://mastodon.social/tags/funnymoments" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>funnymoments</span></a></p>
Jellyman<p><a href="https://mastodon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a> <a href="https://mastodon.social/tags/rocketleague" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rocketleague</span></a> <a href="https://mastodon.social/tags/gaming" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gaming</span></a> <a href="https://mastodon.social/tags/Jellyman" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Jellyman</span></a> <a href="https://mastodon.social/tags/funnymoments" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>funnymoments</span></a></p>
Jellyman<p><a href="https://mastodon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a> <a href="https://mastodon.social/tags/rocketleague" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rocketleague</span></a> <a href="https://mastodon.social/tags/gaming" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gaming</span></a> <a href="https://mastodon.social/tags/Jellyman" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>Jellyman</span></a> <a href="https://mastodon.social/tags/funnymoments" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>funnymoments</span></a></p>
ma𝕏pool<p>Self-Improving Reasoners.</p><p>Both expert human problem solvers and successful language models employ four key cognitive behaviors</p><p>1. verification (systematic error-checking), </p><p>2. backtracking (abandoning failing approaches), </p><p>3. subgoal setting (decomposing problems into manageable steps), and </p><p>4. backward chaining (reasoning from desired outcomes to initial inputs). </p><p>Some language models naturally exhibits these reasoning behaviors and exhibit substantial gains, while others don't and quickly plateau. </p><p>The presence of reasoning behaviors, not the correctness<br>of answers is the critical factor. Models with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions. </p><p>It seems that the presence of cognitive behaviors enables self-improvement through RL. </p><p>Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs<br><a href="https://arxiv.org/abs/2503.01307" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2503.01307</span><span class="invisible"></span></a></p><p><a href="https://mathstodon.xyz/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://mathstodon.xyz/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <br><a href="https://mathstodon.xyz/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://mathstodon.xyz/tags/DL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>DL</span></a> <a href="https://mathstodon.xyz/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a></p>
ma𝕏pool<p>Richard Sutton and Andrew Barto Win 2024 Turing Award <a href="https://awards.acm.org/about/2024-turing" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">awards.acm.org/about/2024-turi</span><span class="invisible">ng</span></a></p><p>Andrew Barto and Richard Sutton Recognized as Pioneers of Reinforcement Learning</p><p><a href="https://mathstodon.xyz/tags/compsci" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>compsci</span></a> <a href="https://mathstodon.xyz/tags/turingaward" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>turingaward</span></a> <a href="https://mathstodon.xyz/tags/computerscience" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>computerscience</span></a> <a href="https://mathstodon.xyz/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://mathstodon.xyz/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://mathstodon.xyz/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a></p>
Ben Lorica 罗瑞卡<p>SFT vs. RFT: Choosing the Right Fine-Tuning Strategy for Your AI<br>✅ Customizing foundation models has become essential for organizations seeking to create differentiated value<br><a href="https://indieweb.social/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://indieweb.social/tags/LLM" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLM</span></a> <a href="https://indieweb.social/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a><br><a href="https://gradientflow.com/post-training-rft-sft-rlhf/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gradientflow.com/post-training</span><span class="invisible">-rft-sft-rlhf/</span></a></p>
Goncalo Gordo<p>winning submission for the second Tinker AI competition (more detail can be found here <a href="https://tinkerai.run/experiments/67c173156497f18e04c24314/" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">tinkerai.run/experiments/67c17</span><span class="invisible">3156497f18e04c24314/</span></a>) <a href="https://mastodon.social/tags/robot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robot</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a></p>
Stuart Spence<p>Latest Karpathy video is a great semi technical overview of LLMs and other related concepts:</p><p><a href="https://mstdn.ca/tags/llm" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llm</span></a> <a href="https://mstdn.ca/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mstdn.ca/tags/karpathy" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>karpathy</span></a> <a href="https://mstdn.ca/tags/chatgpt" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>chatgpt</span></a> <a href="https://mstdn.ca/tags/llama" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>llama</span></a> <a href="https://mstdn.ca/tags/gemini" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>gemini</span></a> <a href="https://mstdn.ca/tags/rlhf" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rlhf</span></a> <a href="https://mstdn.ca/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a></p><p><a href="https://youtu.be/7xTGNNLPyMI?feature=shared" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">youtu.be/7xTGNNLPyMI?feature=s</span><span class="invisible">hared</span></a></p>
Tero Keski-Valkama<p>Hear me out: I think applying RL on <a href="https://rukii.net/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a> and LMMs is misguided, and we can do much better.</p><p>Those <a href="https://rukii.net/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> algorithms are unsuitable for this, and for example they cannot learn how their decisions affect the eventual rewards, but instead are just optimized to make the decisions based on Bellman optimization.</p><p>Instead we can simply condition the LLMs with the rewards. The rewards become the inputs to the model, not something external to it, so the model will learn the proper reward dynamics, instead of only being externally forced towards the rewards. The model can itself do the credit assignment optimally without fancy mathematical heuristics!</p><p>This isn't a new idea, it comes from goal-conditioned RL, and decision transformers.</p><p>We can simply run the reasoning trajectories, judge the outcomes, and then put the outcome tokens first to these trajectories before training them to the model in a batch.</p><p><a href="https://arxiv.org/abs/2211.15657" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arxiv.org/abs/2211.15657</span><span class="invisible"></span></a></p>
Tero Keski-Valkama<p>How to formulate exploration-exploitation trade-off better than all the hacks on top of Bellman equation?</p><p>We can first of all simply estimate the advantage of exploration by Monte-Carlo in a swarm setting: Pitting fully exploitative agents against fully exploitative agents which have the benefit of recent exploration. This can be easily done by lagging policy models.</p><p>Of course the advantage of exploration needs to be divided by the cost of exploration, which is linear to the number of agents used in the swarm to explore at a particular state.</p><p>Note that the advantage of exploration depends on the state of the agent, so we might want to define an explorative critic to estimate this.</p><p>What's beautiful in this formulation is that we can incorporate autoregressive <a href="https://rukii.net/tags/WorldModels" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>WorldModels</span></a> naturally, as the exploitative agents only learn from rewards, but the explorative agents choose their actions in a way which maximizes the improvement of the auto-regressive World Model.</p><p>It brings these two concepts together as sides of the same coin.</p><p>Exploitation is reward-guided action, exploration is auto-regressive state transition model improvement guided action.</p><p>Balancing the two is a swarm dynamic which encourages branching where exploration has an expected value in reward terms. This can be estimated by computing the advantage of exploitative agents utilizing recent exploration versus agents which do not, and returning this advantage to the points of divergence between the two.</p><p><a href="https://rukii.net/tags/mathematics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>mathematics</span></a> <a href="https://rukii.net/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://rukii.net/tags/RL" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>RL</span></a> <a href="https://rukii.net/tags/AI" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>AI</span></a> <a href="https://rukii.net/tags/LLMs" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>LLMs</span></a></p>
Antonín Kindl<p>Running away from the doc! 👨‍⚕️👨🏻‍🔧</p><p><a href="https://defcon.social/tags/machine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machine</span></a> <a href="https://defcon.social/tags/robots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robots</span></a> <a href="https://defcon.social/tags/robotics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotics</span></a> <a href="https://defcon.social/tags/evolution" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>evolution</span></a> <a href="https://defcon.social/tags/robot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robot</span></a> <a href="https://defcon.social/tags/algorithms" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>algorithms</span></a> <a href="https://defcon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://defcon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a> <a href="https://defcon.social/tags/system" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>system</span></a> <a href="https://defcon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://defcon.social/tags/intelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>intelligence</span></a> <a href="https://defcon.social/tags/bot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bot</span></a> <a href="https://defcon.social/tags/arduino" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>arduino</span></a> <a href="https://defcon.social/tags/esp32" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>esp32</span></a> <a href="https://defcon.social/tags/hw" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>hw</span></a> <a href="https://defcon.social/tags/animatronic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>animatronic</span></a> </p><p><a href="https://kindl.work/2025/System+(in+progress)" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">kindl.work/2025/System+(in+pro</span><span class="invisible">gress)</span></a></p>
Antonín Kindl<p><a href="https://defcon.social/tags/machine" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>machine</span></a> <a href="https://defcon.social/tags/robots" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robots</span></a> <a href="https://defcon.social/tags/robotics" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robotics</span></a> <a href="https://defcon.social/tags/evolution" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>evolution</span></a> <a href="https://defcon.social/tags/robot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robot</span></a> <a href="https://defcon.social/tags/algorithms" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>algorithms</span></a> <a href="https://defcon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://defcon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a> <a href="https://defcon.social/tags/system" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>system</span></a> <a href="https://defcon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://defcon.social/tags/intelligence" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>intelligence</span></a> <a href="https://defcon.social/tags/bot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>bot</span></a> <a href="https://defcon.social/tags/arduino" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>arduino</span></a> <a href="https://defcon.social/tags/esp32" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>esp32</span></a> <a href="https://defcon.social/tags/hw" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>hw</span></a> <a href="https://defcon.social/tags/animatronic" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>animatronic</span></a></p>
Goncalo Gordo<p>winning submission for the first Tinker AI competition (25m in 5.2sec)! <a href="https://mastodon.social/tags/robot" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>robot</span></a> <a href="https://mastodon.social/tags/ai" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>ai</span></a> <a href="https://mastodon.social/tags/rl" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>rl</span></a></p>