Ataraxia through Epoché: [AI] Tuning jargon

Supervised Fine-Tuning (SFT)

Parameter-Efficient Fine-Tuning (PEFT):
LoRA / QLoRA: The industry standard. Instead of changing the whole model, you add tiny "adapter" layers. This reduces the VRAM requirement by up to 90%.
DoRA (Weight-Decomposed Low-Rank Adaptation): A newer 2025/2026 favorite that decouples the magnitude and direction of weight updates, often yielding better results than LoRA.

SDFT (Self-Distillation Fine-Tuning): A breakthrough method (popularized by MIT in early 2026) where the model uses its own reasoning to generate better training data for itself, reducing the need for human-labeled sets.

SFT vs. RLHF (The "Teacher" vs. the "Critic")
It is helpful to think of SFT as the first step in a two-part education:

SFT (The Teacher): Tells the model, "Here is exactly how a good answer looks. Copy this."
RLHF/RFT (The Critic): Comes after SFT. It tells the model, "You gave me three answers; this one is better than that one." This is used for "alignment"—making the model safer, more polite, or better at complex reasoning (like DeepSeek-R1 or OpenAI’s o1).

Why Use SFT?

Domain Expertise: Teaching a model medical, legal, or proprietary company jargon.
Style/Voice: Ensuring the AI sounds like your specific brand (e.g., "Professional yet cheeky").
Format Constraints: Forcing the model to always output valid JSON or specific code structures.
Efficiency: A fine-tuned 7B model can often outperform a generic 70B model on a specific, narrow task.
Pro Tip: In 2026, the mantra is "Quality over Quantity." 1,000 extremely high-quality, human-verified examples will almost always result in a better model than 50,000 noisy, machine-generated ones.

Ataraxia through Epoché

Apr 20, 2026

[AI] Tuning jargon

No comments:

Post a Comment