Thoughts

un rlhf-ing myself

I’ve spent a lot of my adult life removing my sft and rlhf so I can get to the more capable but less polished pretraining

June 4, 2026

The trap of "Any reasonable person would feel that way"

Recently, I realized that I often live up to my expectation of how I should feel. E.g., if something goes wrong, I often ask myself if my reaction is reasonable.

That makes sense, and when I find it to be unreasonable it's a useful tool. But when I find it to be reasonable, that can intensify emotions that don't really exist, especially feeling wronged.

So I'm trying to do this less.

June 1, 2026

Devtools are consumer products

In my mind the defining attributes of devtools as a category are

  • They help with sometimes minute aspects of a users workflow - they are optimization tools
  • They are often created by developers for developers, and often start off for personal use
    • Though developers in general benefit from the proliferation of devtools
  • In their procurement: often, devtools are paid for by companies to increase the productivity of their employees
  • The driving logic of a devtools is that it is worth being more productive, it is worth composing together tools to be a 10x or 100x developer

Why shouldn't the same logic be applied to consumer products which are:

  • Similarly designed to help individuals
  • Benefit most when the people making it are also users of the thing

The main difference is that the currently, we imagine consumer products as either being vapid because it's for consumer or fundamentally a business product. These are shortcuts IMO, and bad ones.

But why shouldn't people want to be 10 or 100x better in their life? So devtools are consumer products and consumer products should be thought of like devtools.

May 23, 2026

Hypothesis: tools that are aspirationally useful to us will be useful to others

By aspirationally useful to us I mean:

  • aspirational: something that would enable us to live life closer to our optimum (defined as being able to do the most impactful version of the things we care about)
  • useful at an individual level - not for our companies utility functions (to the extent they diverge from our own) - they serve us individually
May 23, 2026

mimicry in produce management

Something I’ve subconsciously been doing is altering experiences so they “feel” more like an app for a particular app - eg, consumer apps should feel almost dumbed down and almost faceless, prosumer apps should push users to join communities and feel the opposite with all the dials and knobs exposed and pushing users to join communities.

This is interesting to me because

  1. these general feelings are often not based in logic or what we know about the customer or what would make their experience better
  2. Despite kind of knowing 1), I still do it!

A charitable way to explain it:

  • Users also have these same kinds of “feeling”s about what a consumer software etc looks like, and we are merely conforming to them

A less charitable way:

  • they are a cop out that dumbs down our audience & assumes symmetry between the different things they do in life where there isn’t any
  • eg, consumer probably turn their brain off when ordering DoorDash so simplicity is imperative there, but there are other places where they want powerful tools
  • A generalization of this is that we assume what there is is all there should be.

Probably both of them are true. I care more about the latter, because as a startup founder that is where innovation lies.

May 23, 2026

ai agents exaggerate the "nice" properties of computers:

  • determinism
  • declarativeness
  • simplicity and uniformity (the avoidance of special cases) because they operate with the reasoning capabilities of humans but 1000x the output

In many cases the affordances we provide to humans can be provided by agents or created at a second level

May 21, 2026

Allowing users to contribute to an extend existing apps is wonderful

May 21, 2026

Arguments against usage-based pricing (I can't think of good arguments for)

Let's look at existing claude code monthly plan subscribers. Currently they pay for capacity at a below api-cost rate

This is for 3 reasons I can think of:

  1. Claude code session limits allow for smoothing of token demand at a user level (and also at a population level, since you can reduce the limits at more popular times)
  2. Most users of claude code do not fully use up their usage-limits, thus claude code functions kind of like insurance against needing a lot of tokens
  3. Claude code increases demand for tokens too at some level in addition to the "insurance" effect; that is, users overall just will spend more on AI even if they didn't have spiky token usage
  4. (I am sure there are reasons I cannot think of)

Thus the argument against token based pricing is that these nice properties don't hold for token based pricing. You don't have float, you don't have demand smoothing, you don't have this nice property that users just use your product more when it is structured as a subscription. Indeed you can see this counterfactual in the growth of Claude code before (in the per-token pricing era) and after the $200/mo plan came out (though, that is also related to Claude code simply getting way better)

Ok so what's the case for usage based pricing: it makes your value proportional to what the user gets out of it. Which is useful, because, in expectation most users get a better deal out of it.

However, this isn't really an argument for why the usage-based pricing is economically good. Sure the user will be better off in expectation if they are thinking purely rationally. But they may not be, they may prefer to have the price be a smooth quantity they can plan around.

And even if you capped the price that users will pay, they will likely think of the maximum as the price they will pay.

The only argument for usage based pricing I can think of that would work on consumers would be showing them their actual usage as evidence they are getting taken by fixed pricing.

And to think I started out with the idea that usage-based pricing was good

May 21, 2026

claude code's business model is like insurance

Most users of claude code do not fully use up their usage-limits most weeks, thus claude code functions kind of like insurance against needing a lot of tokens. When you do need them, they are cheap, but for many users they are paying in more than they get out. Also the inactive users subsidize the active ones to (I imagine) make it a profitable enterprise for anthropic

May 21, 2026

Coding w LLM: have the LLM write a test first in a separate commit, make sure the test fails and then succeeds once the patch is applied

May 21, 2026

Combatting the planning fallacy

  1. If you know how long things have taken in the past, use that data
  2. if you don't know that then apply a heuristic so that you end up in a place so pessimistic, you are 50% likely to take less time than the estimate
    1. One deeply simple heuristic: be as pessimistic as you can reasonably manage, then double it
  3. Try to decompose something big into it's parts (we tend not to do that automatically leading to course grain and bad estimates)
    1. While doing this, you should account for T_unknown, the subtasks you anticipate (the idiom "once you do 90% you only have the second 90% left" applies)
    2. Then apply your favorite heuristic from #2
  4. You should also discount how much of an improvement you think something will be (we overestimate the benefit while underestimating the cost
  5. Time boxing (useful because part of planning fallacy is Parkinson's Law)
May 21, 2026

Cool software details in the Apple Music app

  • Apple Music radio shows change the cover art & artist as you listen
  • the playlists you save to your library update, including the ones that change weakly (I’m not sure if this is positive)
  • Apple animated album covers
May 21, 2026

Corollary: pushing yourself is as kind of selective activity, you shouldn't really make everything super stressful

May 21, 2026

Corollary to the planning fallacy:

People underestimate what they can do in the long run (e.g., due to not understanding compounding)

This is really the same as people underestimating what can be done in general

May 21, 2026

Github (and cursor & other cloud environments) should support interactive diffs. So you can edit the diffs with small changes as you go.

May 21, 2026

"He had no idea of its cause, still less of its cure; but discontent had  come into his soul, and he had taken one small step toward humanity" (from 2001: A Space Odyssey)

May 21, 2026

how does iOS's system for auto-removing unused apps + redownloading them later? it's very seamless & preserves all data

May 21, 2026

Idea for tuning LLMs for translation & code / RLAIF

Give one instance of a LLM some rich context and produce a description of the code it would like.

Pass it to a second instance and then have it generate the code

Pass the code back and have it iterate until the code fits the context.

You now have (long context, short description, code pairs) which can be used to enrich the training

May 21, 2026

In general, people should schedule themselves for the future (ie, I will revisit this at X time, I will read this at Y time) etc to help them accomplish their goals, but they don't - they fail to do things that are best done later at the appropriate time

Either this is a psychological thing, or we just haven't made this advancement as a society yet

May 21, 2026

It's possible to do things slowly and still yield results, given they are the right things. Speed has become kind of an aesthetic thing in twitter technology culture

Maybe the underlying assumption is that you wish to gain wealth as fast as possible, but there are definitely some things that take longer than a few months

May 21, 2026

LLM debuggers & sympathy

Many problems in LLM can feel incredibly inscrutable because, unlike code, we cannot trace the internal mechanics of an LLM. Thus, it is not possible to mechanically trace back to the source of an error.

However, it is simultaneously true that, at the current high levels of LLM capability, many errors in the LLMs capability are caused by poor context engineering (e.g., overloading the context window, not providing the LLM the right context for what to do in case of an error, not being careful enough about what makes it into the token stream)

Thus, while we cannot mechanically trace back errors we can intuitively understand them by creating an llm debugger that shows the context of the LLM.

I think this has a strong analogy to the idea of sympathy. We can either choose to be frustrated by poor and illegible outcomes from an LLM or aim to develop sympathy by using an LLM debugger to see what the LLM sees

implementation

A simple implementation is to wrap the anthropic or openai sdk and intercept the inputs and outputs, then display them in a nice way, including things like token usage at each step.

A better implementation allows us to to attach context to the tokens - e.g., to reveal if we are incorrectly presenting the underlying data to the LLM or if we are supplying too much of it

May 21, 2026

LLMs are contextual meaning they can behave significantly better or worse depending on the context. This provides opportunity for self-improvement within the same model

May 21, 2026

P(X is hacked over next Y years) where x is github, X.com etc and exposes your personal data?

Prediction markets (maybe ones with no atstake,sinceat stake, since may incentivize a hack)

IMO, this is very high. I would put my current P(github is hacked over next 2 yearts) at 50%

Therefore, I probably should not trust github private repos with anything sensitive

May 21, 2026

Relax for the same results - https://sive.rs/relax

The idea is that you can the same results by not being super intense about everything.

I think part of the idea is that you don't bind up your ego with thing thing you do when you don't treat it as do or die

May 21, 2026

"Remember this browser" popups should happen after you log in! There's muscle memory to hit enter after you type your password which means I miss the little checkbox

May 21, 2026

"saying 'Someone else should run this company' is like saying 'Someone else be the husband to my wife', even if you think it you never say it!" - PG paraphrased by ryan petersen paraphrased by me

Can't attest to the accuracy, but it is hilarious!

May 21, 2026

Small affordance in the alarms section on iOS. When you modify an alarm it is also turned on. My engineer brain would say those are 2 separate things, but you only edit an alarm to set it. Good design doesn’t fall prey to my engineer brain.

May 21, 2026

Someone please make the contacts app on MacOS not dog slow (im on MacOS 26 but i think sequouia had the same issue)

May 21, 2026

something about LLM apis costing money (even though it's just a little bit) gets stuck in my craw when using them in tools. I think we will get over this.

For instance, I want to select titles of my thoughts from the contents (since a deterministic approach like taking the first few words doesn't always work). I feel hesitant to put an llm to work even though it would cost me pennies.

we thinking strangely when it comes to money!

May 21, 2026

"Technology isn't the same thing as science at all. And trying lots of different ways to do something isn't the same as experimenting to figure out the rules." There were plenty of people who'd tried to invent flying machines by trying out lots of things-with-wings, but only the Wright Brothers had built a wind tunnel to measure lift..."

  • Harry Potter and the Methods of Rationality
May 21, 2026

testing hypotheses in science

The goal is not to prove or disprove a hypothesis but gain maximum information about it. This could mean positive or negative tesing

(and I guess we have some threshold or randomization factor to determine how to act once we gain information)

May 21, 2026

testing over the entire hypothesis space

In reality their is a set of hypotheses

  • you cannot enumerate all of them (there is always some HotherH_{other})
  • any observation allocates probability to each of them

https://claude.ai/share/3062a9c1-48de-4919-a0fd-a906d7cf5453

May 21, 2026

The Bayesian VC

One would assume that many VCs would make use of Bayesian reasoning to understand the world. After all, they are in an environment with lots of signals that they must apply to companies. Being rash and saying "X failed in the past, therefore I will put 0% allocation into X" is a failure mode; therefore doing a bayesian update on P(X succeeds) seems prudent.

Somehow, I doubt many VCs do this, which is not a shot at VCs; it could be that it is not useful for a number of reasons. The one that I can think of is that most of the signals simply don't matter; as many are fond of saying, it's more about the founders.

However, given that VCs are very actively reacting to changes in the world, I don't think this is true.

May 21, 2026

The codex computer use model is incredibly elegant:

  • separate computer use mcp which also has all the UI that's specific to computer use (e.g., the cursor and the menubar icon)
  • integrate that mcp as just another skill
May 21, 2026

The ideal situation for a small team (2-3 people) is if 1 person becomes more vocal, then the other people do too. Lean into it.

Often it goes the opposite way, the other person shrinks (I personally have done this)

May 21, 2026

the isolation effect (von resteroff effect) for marketing

When you have a list of options, any option that has something different stands out even if the signal itself is weak

We have a stand-out bias - which can be exploited to make things stand out

May 21, 2026

"There are a few ways in which a man can be more innocently employed than in getting money"

  • Samuel Johnson (found via the Money Stuff Podcast ep on Jan 2, 2026)
May 21, 2026

Timeboxing: Parkinson's Law + Planning Fallacy

Parkinson's Law: Things take as long as you have to do them ("Deadlines are the condition of productivity")

Timeboxing: set a fixed period of time to work on some activity and stopping unconditionally

Why it works

  • Parkinson's law implies that setting a shorter time period may allow us to compress the schenanigans
  • You make progress on something, without the anxiety of completing it (which is known to be a hard problem to estimate, according to the planning fallacy)
May 21, 2026

To build products that are as good as Apple in its heyday means you have to build the products first and then wait 10 years for opinion to catch up; being patient is crucial

May 21, 2026

We are always like fish - the water we are in imposes optimization criteria and constraints we can't detect.

For instance, in startups, maybe one implicit criteria is getting wealth or revenue as fast as possible. This is good signal, but it may not actually be algined the goal we set for ourself (e.g., to build a particular kind of thing we wish to see in the world etc)

May 21, 2026

When designing an improvement for a system, you typically don’t want it to get worse. Good strategy:

  • figure out an error rate for the old system
  • compute the agreement between the new system + the old system
  • if the agreement is significantly less than the error rate of the old approach you have a problem
May 21, 2026

"You tell me whar a man gits his corn pone, en I'll tell you what his 'pinions is."

Part of this means our incentives shape our constraints. (there is more too it - Twain also says all opinions originate in this way. See: https://paulgraham.com/cornpone.html for a more complete excerpt)

May 21, 2026

Listening to “Human Resources” by Dan Carlin, about slavery.

One is struck how much people were able to shape their opinions to their monetary incentives (eg, revolutionary France controlling Saint-Domingue, now Haiti)

It’s worth listening too because we are under informed about slavery, since it is avoided in media besides to make a current political point

https://podcasts.apple.com/us/podcast/dan-carlins-hardcore-history/id173001861?i=1000553133741

January 27, 2026

Small accordance in the alarms section on iOS. When you modify an alarm it is also turned on. My engineer brain would say those are 2 separate things, but you only edit an alarm to set it. Good design doesn’t fall prey to my engineer brain.

January 26, 2026