Jan 14, 2026

[C++] Cache-friendly

 Just always having int as default if usage is under contract.



Reference:
Object Lifetimes reading minute
alignas(int) unsigned char buffer[sizeof(int)]; // used for provenance contract.

One byte types:

Benchmark shows
char8_t is faster due to optimizer.
this is that a pointer of char, unsigned char, or std::byte type can point to anywhere, thus
data.size() could potentially being modified inside the loop, thus accessing data.size() is necessary.
As for char8_t has no this privilege, thus data.size() is fixed and stored inside a register.




fix: old trick, store the data.size() inside the register ourselves.

or just:


Bitfields is good for saving memory but accessing them are slow (bitmask other values).
also, under meta-programming, type inference of those bitfields type has to be casted.


Reference:

When choosing a container type to use, consider:


L2 cache is used for data and instructions.
i.e. how we code matters the speed, even the results are the same.
e.g.
slower, due to branch and if branch succeed, instructions are replaced in the cache.

faster, due to branch rarely happen, and REPEAT64 has much more instruction than simple count+=d

Summary


Reference:

Always start with DoD; focus on algorithm.

DoD is more packed with data layout, good for caching.

SoA
Reference:
Align with the cache line to avoid false sharing

CS 101, keep false sharing affect in mind.


All-in-all summary


Avoid long branches:

// BAD
void process_data(Data* d) {
    if (!d) {
        // LONG BRANCH / COLD CODE
        // Imagine 100 lines of logging, stack tracing, 
        // and complex error recovery logic here.
        log_error("Null pointer detected...");
        cleanup_subsystems();
        notify_admin_via_snmp();
        throw std::runtime_code("Fatal Error");
    }

    // This "hot" code is now physically far away from the 'if' 
    // check in the compiled binary.
    d->value += 42; 
    d->status = Ready;
}

// Better, avoid I-Cache miss

// Move cold logic to a non-inline function
[[noreturn]] void handle_fatal_error() {
    log_error("Null pointer detected...");
    cleanup_subsystems();
    notify_admin_via_snmp();
    throw std::runtime_code("Fatal Error");
}

void process_data_optimized(Data* d) {
    // C++20 [[likely]]/[[unlikely]] attributes guide the compiler
    if (!d) [[unlikely]] {
        handle_fatal_error(); // A jump to a far-away location happens ONLY on error
    }

    // This code is now physically adjacent to the 'if' check
    d->value += 42; 
    d->status = Ready;
}




Jan 5, 2026

[6.S191][note] Introduction to Deep Learning


Why activation functions?
 introduce non-linearities into the network.


The loss of network measures the cost incurred from incorrect predictions.

the empirical loss measures the total loss over entire dataset.


Cross entropy loss can be used with models that output a probability between 0 and 1
f() here is the activation function, returns [0, 1].
- used due to log with probability is negative, thus we apply - to flip to positive value.


Mean squared error loss can be used with regression models that output continuous real numbers.


Training the network

Focus on Loss Optimization
We want to find the network weights that achieve the lowest lose.



Here's the gradient decent comes in. (back propagation)

i.e. every weight.
Thus, by using derivative(i.e. m, slope), we can see if the m is >0 or < 0. If > 0, we lower the weight. < 0, we increase the weight.


So how much the steps to take? (learning rate)
set too small, slow, too large, unstable.


Methods:








Jan 4, 2026

[AI math] Proximal Policy Optimization (PPO)

Proximal Policy Optimization(PPO 2017)

It tries to solve destructive updates issue.

In Reinforcement Learning, if you take a step that is too large, the agent "falls off a cliff." It adopts a policy that causes it to fail immediately, which generates bad data, which leads to even worse updates. It never recovers.

The Deep Dive: The Mechanics of "Clipping"

The genius of PPO lies in its pessimistic view of the world. It doesn't trust its own recent success.
When the agent collects data, it calculates the Ratio between the new policy and the old policy for a specific action:

The "Pessimistic" Update

Standard algorithms simply push this ratio as high as possible for good actions. PPO says: "Stop."

  • It looks at the Advantage (how good the action was) and applies a Clip:
    If the action was good (Positive Advantage): PPO increases the probability of doing it again, BUT it caps the Ratio at roughly 1.2 (increasing probability by 20%).
    Even if the math says "this action was amazing, boost it by 500%," PPO clips the update at 20%. It refuses to overcommit to a single piece of evidence.
  • If the action was bad (Negative Advantage):
    PPO decreases the probability, BUT it caps the decrease at 0.8. It refuses to completely destroy the possibility of taking that action again based on one failure.
This results in a "Trust Region"—a safe zone around the current behavior where the agent is free to learn, but beyond which it is physically prevented from moving.

Usage

Reinforcement Learning from Human Feedback (RLHF).

The Setup

  • The Agent (Policy): The LLM itself (e.g., Llama-2-70b). Its "Action" is choosing the next token (word) in a sentence.
  • The Environment: The conversation window.
  • The Reward Model: A separate, smaller AI that has been trained to mimic a human grader. It looks at a full sentence and outputs a score (e.g., 7/10 for helpfulness).

The PPO Training Loop

  1. Rollout (Data Collection): The LLM generates a response to a prompt like "Explain gravity."
    LLM: "Gravity is a force..." It records the probability of every token it chose (e.g., it was 99% sure about "force").
  2. Advantage Calculation: The Reward Model looks at the finished sentence and gives it a score (Reward). The PPO algorithm compares this score to what it expected to get.
    1. Scenario: The model usually writes boring answers (Expected Reward: 5). This answer was witty and accurate (Actual Reward: 8).
    2. Advantage: +3. This was a "better than expected" sequence of actions.
  3. The PPO Update (The Critical Step): We now update the LLM's neural weights to make those specific tokens more likely next time.
    1. Without PPO: The model might see that high reward and drastically boost the probability of those specific words, potentially overfitting and making the model speak in repetitive loops or gibberish just to chase that score.
    2. With PPO: The algorithm checks the Ratio.
      "Did we already increase the probability of the word 'force' by 20% compared to the old model?"
      Yes? -> CLIP. Stop updating. Do not push the weights further.

PPO is not just an optimization method; it is a constraint method.
It allows AI to run training loops that would otherwise be unstable.


Policy

Policy is a function that maps a State (what the agent sees) to an Action (what the agent does).
In mathematical formulas, the policy is almost always represented by the Greek letter π.
a = π(s)
s: State(input)
π: Policy(logic)
a: Action(output)

The Two Types of Policies

  • Deterministic Policy
    This policy has no randomness. For a specific situation, it will always output the exact same action.
    Example: A chess bot. If the board is in arrangement X, it always moves the Knight to E5
    a = π(s)
  • Stochastic Policy (Used in PPO)
    This policy deals in probabilities. Instead of outputting a single action, it outputs a probability distribution over all possible actions. The agent then samples from this distribution.
    Example: A robot learning to walk. If it is tilting left, it might be 80% likely to step right and 20% likely to wave its arm. It rolls the dice to decide.
    π(a | s) : read as "the probability of taking action a given state s
Stochastic policies are essential for learning. If a policy is 100% deterministic, the agent never tries anything new; it just repeats the same mistakes. By adding randomness (probabilities), the agent explores different actions, which allows it to discover better strategies.


In modern AI (like PPO), the "Policy" is a Neural Network.
  • Input: The network receives the State (e.g., the pixels of a video game screen, or the text of a user prompt).
  • Hidden Layers: The network processes this information.
  • Output: The network outputs numbers representing the probability of each action.
    Action 1 (Jump): 0.1
    Action 2 (Run): 0.8
    Action 3 (Duck): 0.1
When we say we are "training the policy," we are simply adjusting the weights of this neural network so that it assigns higher probabilities to "good" actions and lower probabilities to "bad" actions.


Value Function (V)

often called the Critic, is the partner to the Policy (the Actor).
While the Policy answers "What should I do?", the Value Function answers: "How good is it to be in this situation?"

  1. 1. The Core Concept: Prediction
    Imagine you are playing a video game.
    The Policy (Actor): Looks at the screen and presses the "Jump" button.
    The Value Function (Critic): Looks at the screen and says, "We currently have a 70% chance of winning."
    The Critic doesn't play the game; it predicts the outcome.
  2. The Math V(s)
    The Value function maps a State s to a single number (Scalar).
  3. Why PPO Needs the Critic (The Advantage)
    This is the most important part. PPO updates the Policy based on the Advantage. The Advantage is calculated using the Value Function.
    The Logic: To know if an action was "good," we can't just look at the reward.
    Example: If you get a reward of +10, is that good?
    If you usually get +1, then +10 is amazing.
    If you usually get +100, then +10 is terrible.
    The Value Function provides that baseline "usual" expectation.

  4. How the Critic Learns
    While the Actor learns to maximize reward, the Critic learns to be a better predictor.
    It uses a simple regression loss function (Mean Squared Error):

    At every step, the Critic looks at the reward actually received and updates itself: "I predicted this state was worth 5 points, but we actually got 8. I should update my weights to predict higher next time."
  5. The PPO Team:
    Actor (Policy π): "I think we should move Left."
    Environment: (Agent moves Left, gets +5 reward, lands in new state).
    Critic (Value V): "I expected a reward of +2. You got +5. That was a great move!"
    PPO Update: "Since the Critic said that was 'great' (Positive Advantage), let's adjust the Actor's weights to make 'Move Left' more likely next time—but clip it so we don't go crazy."

Jan 3, 2026

[deep dive llm (gpt-2)]

Reference:
https://huggingface.co/spaces/HuggingFaceFW/blogpost-fineweb-v1
Let's reproduce GPT-2 (1.6B): one 8XH100 node, 24 hours, $672, in llm.c

Tool:
https://tiktokenizer.vercel.app/

Library:
https://github.com/google/sentencepiece


Flow



[C++26] std::execution

Reference:
https://en.cppreference.com/w/cpp/experimen

P3288R3 std::elide
https://www.virjacode.com/papers/p3288.htm

code:

constexpr explicit operation_state(Range&& r, Receiver rcvr)
    : /* ... */
    states_(
        std::from_range,
        std::ranges::views::transform(
            std::forward<Range>(r),
            [&]<typename U> requires
                std::constructible_from<ender, U>(U&& u)
            {
                return elide([&, s = std::forward<U>(u)]() mutable {
                    return state_(*this, std::move(s));
                });
            })),
    /* ... */
{}

[C++] Equality preservation

Reference:
Equality preservation

Equality Preservation is a rule that says a function (or expression) must be "well-behaved" and "predictable."

It essentially means: If you give it the same inputs, you should get the same outputs.

  • The Core Idea: "No Surprises"
If we have two objects, a and b, and they are equal (a == b), then doing the same thing to both of them should result in the same outcome.

Equal Inputs -> Equal Results: If f(a) returns 5, and a == b, then f(b) must also return 5.
Equal Inputs -> Equal Side Effects: If calling f(a) changes a to 10, then calling f(b) must change b to 10.
  • The Requirement of "Stability"
The standard also requires equality-preserving expressions to be stable. This means if we haven't changed the object ourselves, calling the function twice on the same object must give the same result both times.

Example of something NOT stable:
A function that returns a random number or a function that uses a global counter that increments every time you call it. i.e. has no Referential transparency
    int counter = 0;
    int bad_function(int x) { return x + (counter++); } // NOT equality-preserving


By requiring concepts to be equality-preserving, C++ guarantees that:
  • The algorithm can trust the values it reads.
  • The algorithm can safely make copies of objects and expect the copies to behave exactly like the originals.

Unless noted otherwise, every expression used in a requires expression of the standard library concepts is required to be equality preserving, and the evaluation of the expression may modify only its non-constant operands. Operands that are constant must not be modified.

In the standard library, the following concepts are allowed to have non equality-preserving requires expressions:
  • output_iterator
  • indirectly_writable
  • invocable
  • weakly_incrementable
  • range

Jan 1, 2026

[algorithm] minhash

Jaccard Similarity



The Conceptual Process
  • Shingling: Convert documents into sets of short strings (e.g., 3-word phrases).
  • Permutation: Imagine taking all possible shingles and randomly shuffling their order.
  • The "Min" Hash: The MinHash value for a document is the index of the first shingle that appears in that document after the shuffle.
The probability that two sets have the same MinHash value is exactly equal to their Jaccard Similarity:


Benefit

  • Efficiency: we can compare two 1MB documents by just comparing 100 integers (their MinHash signature).
  • Scalability: It is often paired with Locality Sensitive Hashing (LSH) to find similar items in sub-linear time, meaning we don't have to check every single pair in a database.
  • Storage: only need to store the compact signatures, not the full text of the documents.

Real-World Applications

  • Search Engines: Detecting "mirror" websites or slightly modified versions of the same article.
  • Genomics: Comparing DNA sequences to find similar species or genes.
  • Plagiarism Detection: Finding overlapping blocks of text across a massive database of student essays.



#include <iostream>
#include <vector>
#include <set>
#include <algorithm>
#include <limits>

using namespace std;

// A simple hash function to simulate different permutations
// h(x) = (a * x + b) % c
uint32_t hash_func(uint32_t x, uint32_t a, uint32_t b, uint32_t c) {
    return (a * x + b) % c;
}

int main() {
    // 1. Represent two documents as sets of "shingles" (hashed strings)
    set<uint32_t> doc1 = {10, 25, 30, 45, 50};
    set<uint32_t> doc2 = {10, 25, 31, 46, 50}; // Mostly similar

    // 2. Define parameters for our MinHash signatures
    int num_hashes = 100;
    uint32_t large_prime = 1000003;

    // Coefficients for our hash functions (randomly chosen for this example)
    vector<uint32_t> a_coeffs(num_hashes), b_coeffs(num_hashes);
    for(int i = 0; i < num_hashes; i++) {
        a_coeffs[i] = rand() % 500 + 1;
        b_coeffs[i] = rand() % 500 + 1;
    }

    // 3. Generate Signatures
    vector<uint32_t> sig1(num_hashes, numeric_limits<uint32_t>::max());
    vector<uint32_t> sig2(num_hashes, numeric_limits<uint32_t>::max());

    for (int i = 0; i < num_hashes; i++) {
        // Find minimum hash for doc1
        for (uint32_t shingle : doc1) {
            sig1[i] = min(sig1[i], hash_func(shingle, a_coeffs[i], b_coeffs[i], large_prime));
        }
        // Find minimum hash for doc2
        for (uint32_t shingle : doc2) {
            sig2[i] = min(sig2[i], hash_func(shingle, a_coeffs[i], b_coeffs[i], large_prime));
        }
    }

    // 4. Estimate Similarity
    int matches = 0;
    for (int i = 0; i < num_hashes; i++) {
        if (sig1[i] == sig2[i]) matches++;
    }

    double estimated_jaccard = (double)matches / num_hashes;
    cout << "Estimated Jaccard Similarity: " << estimated_jaccard << endl;

    return 0;
}