Thoughts

Price competition in coding agent plans

I'm surprised that codex has not reduced the price of their top or mid-tier offering; in particular, they offer more tokens/$, so they could have a cheaper plan that does the same amount as claude

Instead the pricing of claude and chatgpt remains mirrored

June 18, 2026

Test Sensitivity

Tests provide some bits of information about how good your implementation is
- against some baseline, typically against your brain or the brain of a smart coder who doesn't know the ins and outs of your code
How many bits of implementation is how sensitive it is
A 0 bit test is one that vacuously passes

June 14, 2026

Using experiments to get around habits

I often feel the need to clean my workspace before i start working on something. this can be bad when i really need to do something, but i feel compelled to do it because somehow it feels more productive

So I framed it as an experiment to myself: do i actually feel any difference when i don't do it. This is a good experiment because we have the potential of being wrong.

Habits are often just beliefs we haven't tried to disprove hard enough. Either we realize their validity or discard them.

June 14, 2026

Google Flights++ (owned by google, but with a more complete but less user-friendly interface) https://matrix.itasoftware.com/search

June 12, 2026

un rlhf-ing myself

I’ve spent a lot of my adult life removing my sft and rlhf so I can get to the more capable but less polished pretraining

June 4, 2026

The trap of "Any reasonable person would feel that way"

Recently, I realized that I often live up to my expectation of how I should feel. E.g., if something goes wrong, I often ask myself if my reaction is reasonable.

That makes sense, and when I find it to be unreasonable it's a useful tool. But when I find it to be reasonable, that can intensify emotions that don't really exist, especially feeling wronged.

So I'm trying to do this less.

June 1, 2026

Devtools are consumer products

In my mind the defining attributes of devtools as a category are

They help with sometimes minute aspects of a users workflow - they are optimization tools
They are often created by developers for developers, and often start off for personal use
- Though developers in general benefit from the proliferation of devtools
In their procurement: often, devtools are paid for by companies to increase the productivity of their employees
The driving logic of a devtools is that it is worth being more productive, it is worth composing together tools to be a 10x or 100x developer

Why shouldn't the same logic be applied to consumer products which are:

Similarly designed to help individuals
Benefit most when the people making it are also users of the thing

The main difference is that the currently, we imagine consumer products as either being vapid because it's for consumer or fundamentally a business product. These are shortcuts IMO, and bad ones.

But why shouldn't people want to be 10 or 100x better in their life? So devtools are consumer products and consumer products should be thought of like devtools.

May 23, 2026

Hypothesis: tools that are aspirationally useful to us will be useful to others

By aspirationally useful to us I mean:

aspirational: something that would enable us to live life closer to our optimum (defined as being able to do the most impactful version of the things we care about)
useful at an individual level - not for our companies utility functions (to the extent they diverge from our own) - they serve us individually

May 23, 2026

mimicry in produce management

Something I’ve subconsciously been doing is altering experiences so they “feel” more like an app for a particular app - eg, consumer apps should feel almost dumbed down and almost faceless, prosumer apps should push users to join communities and feel the opposite with all the dials and knobs exposed and pushing users to join communities.

This is interesting to me because

these general feelings are often not based in logic or what we know about the customer or what would make their experience better
Despite kind of knowing 1), I still do it!

A charitable way to explain it:

Users also have these same kinds of “feeling”s about what a consumer software etc looks like, and we are merely conforming to them

A less charitable way:

they are a cop out that dumbs down our audience & assumes symmetry between the different things they do in life where there isn’t any
eg, consumer probably turn their brain off when ordering DoorDash so simplicity is imperative there, but there are other places where they want powerful tools
A generalization of this is that we assume what there is is all there should be.

Probably both of them are true. I care more about the latter, because as a startup founder that is where innovation lies.

May 23, 2026

ai agents exaggerate the "nice" properties of computers:

determinism
declarativeness
simplicity and uniformity (the avoidance of special cases) because they operate with the reasoning capabilities of humans but 1000x the output

In many cases the affordances we provide to humans can be provided by agents or created at a second level

May 21, 2026

Allowing users to contribute to an extend existing apps is wonderful

Context for this

Github (and cursor & other cloud environments) should

May 21, 2026

Arguments against usage-based pricing (I can't think of good arguments for)

Let's look at existing claude code monthly plan subscribers. Currently they pay for capacity at a below api-cost rate

This is for 3 reasons I can think of:

Claude code session limits allow for smoothing of token demand at a user level (and also at a population level, since you can reduce the limits at more popular times)
Most users of claude code do not fully use up their usage-limits, thus claude code functions kind of like insurance against needing a lot of tokens
Claude code increases demand for tokens too at some level in addition to the "insurance" effect; that is, users overall just will spend more on AI even if they didn't have spiky token usage
(I am sure there are reasons I cannot think of)

Thus the argument against token based pricing is that these nice properties don't hold for token based pricing. You don't have float, you don't have demand smoothing, you don't have this nice property that users just use your product more when it is structured as a subscription. Indeed you can see this counterfactual in the growth of Claude code before (in the per-token pricing era) and after the $200/mo plan came out (though, that is also related to Claude code simply getting way better)

Ok so what's the case for usage based pricing: it makes your value proportional to what the user gets out of it. Which is useful, because, in expectation most users get a better deal out of it.

However, this isn't really an argument for why the usage-based pricing is economically good. Sure the user will be better off in expectation if they are thinking purely rationally. But they may not be, they may prefer to have the price be a smooth quantity they can plan around.

And even if you capped the price that users will pay, they will likely think of the maximum as the price they will pay.

The only argument for usage based pricing I can think of that would work on consumers would be showing them their actual usage as evidence they are getting taken by fixed pricing.

And to think I started out with the idea that usage-based pricing was good

May 21, 2026

claude code's business model is like insurance

Most users of claude code do not fully use up their usage-limits most weeks, thus claude code functions kind of like insurance against needing a lot of tokens. When you do need them, they are cheap, but for many users they are paying in more than they get out. Also the inactive users subsidize the active ones to (I imagine) make it a profitable enterprise for anthropic

May 21, 2026

Coding w LLM: have the LLM write a test first in a separate commit, make sure the test fails and then succeeds once the patch is applied

May 21, 2026

Combatting the planning fallacy

If you know how long things have taken in the past, use that data
if you don't know that then apply a heuristic so that you end up in a place so pessimistic, you are 50% likely to take less time than the estimate
1. One deeply simple heuristic: be as pessimistic as you can reasonably manage, then double it
Try to decompose something big into it's parts (we tend not to do that automatically leading to course grain and bad estimates)
1. While doing this, you should account for T_unknown, the subtasks you anticipate (the idiom "once you do 90% you only have the second 90% left" applies)
2. Then apply your favorite heuristic from #2
You should also discount how much of an improvement you think something will be (we overestimate the benefit while underestimating the cost
Time boxing (useful because part of planning fallacy is Parkinson's Law)

Context for this

## Timeboxing: Parkinson's Law + Planning Fallacy Parkinson's

Further thinking

May 21, 2026

Cool software details in the Apple Music app

Apple Music radio shows change the cover art & artist as you listen
the playlists you save to your library update, including the ones that change weakly (I’m not sure if this is positive)
Apple animated album covers

May 21, 2026

Corollary: pushing yourself is as kind of selective activity, you shouldn't really make everything super stressful

Context for this

Relax for the same results - https://sive.rs/relax The

May 21, 2026

Corollary to the planning fallacy:

People underestimate what they can do in the long run (e.g., due to not understanding compounding)

This is really the same as people underestimating what can be done in general

Context for this

# Combatting the planning fallacy 1. If you

May 21, 2026

Github (and cursor & other cloud environments) should support interactive diffs. So you can edit the diffs with small changes as you go.

Further thinking

Allowing users to contribute to an extend existing

May 21, 2026

"He had no idea of its cause, still less of its cure; but discontent had come into his soul, and he had taken one small step toward humanity" (from 2001: A Space Odyssey)

May 21, 2026

how does iOS's system for auto-removing unused apps + redownloading them later? it's very seamless & preserves all data

May 21, 2026

Idea for tuning LLMs for translation & code / RLAIF

Give one instance of a LLM some rich context and produce a description of the code it would like.

Pass it to a second instance and then have it generate the code

Pass the code back and have it iterate until the code fits the context.

You now have (long context, short description, code pairs) which can be used to enrich the training

Context for this

LLMs are contextual meaning they can behave significantly

May 21, 2026

In general, people should schedule themselves for the future (ie, I will revisit this at X time, I will read this at Y time) etc to help them accomplish their goals, but they don't - they fail to do things that are best done later at the appropriate time

Either this is a psychological thing, or we just haven't made this advancement as a society yet

May 21, 2026

It's possible to do things slowly and still yield results, given they are the right things. Speed has become kind of an aesthetic thing in twitter technology culture

Maybe the underlying assumption is that you wish to gain wealth as fast as possible, but there are definitely some things that take longer than a few months

Context for this

We are always like fish - the water

May 21, 2026

LLM debuggers & sympathy

Many problems in LLM can feel incredibly inscrutable because, unlike code, we cannot trace the internal mechanics of an LLM. Thus, it is not possible to mechanically trace back to the source of an error.

However, it is simultaneously true that, at the current high levels of LLM capability, many errors in the LLMs capability are caused by poor context engineering (e.g., overloading the context window, not providing the LLM the right context for what to do in case of an error, not being careful enough about what makes it into the token stream)

Thus, while we cannot mechanically trace back errors we can intuitively understand them by creating an llm debugger that shows the context of the LLM.

I think this has a strong analogy to the idea of sympathy. We can either choose to be frustrated by poor and illegible outcomes from an LLM or aim to develop sympathy by using an LLM debugger to see what the LLM sees

implementation

A simple implementation is to wrap the anthropic or openai sdk and intercept the inputs and outputs, then display them in a nice way, including things like token usage at each step.

A better implementation allows us to to attach context to the tokens - e.g., to reveal if we are incorrectly presenting the underlying data to the LLM or if we are supplying too much of it

May 21, 2026

LLMs are contextual meaning they can behave significantly better or worse depending on the context. This provides opportunity for self-improvement within the same model

Further thinking

Idea for tuning LLMs for translation & code

May 21, 2026

P(X is hacked over next Y years) where x is github, X.com etc and exposes your personal data?

Prediction markets (maybe ones with no $at stake, since$ may incentivize a hack)

IMO, this is very high. I would put my current P(github is hacked over next 2 yearts) at 50%

Therefore, I probably should not trust github private repos with anything sensitive

May 21, 2026

Relax for the same results - https://sive.rs/relax

The idea is that you can the same results by not being super intense about everything.

I think part of the idea is that you don't bind up your ego with thing thing you do when you don't treat it as do or die

Context for this

We are always like fish - the water

Further thinking

Corollary: pushing yourself is as kind of selective

May 21, 2026

"Remember this browser" popups should happen after you log in! There's muscle memory to hit enter after you type your password which means I miss the little checkbox

May 21, 2026

"saying 'Someone else should run this company' is like saying 'Someone else be the husband to my wife', even if you think it you never say it!" - PG paraphrased by ryan petersen paraphrased by me

Can't attest to the accuracy, but it is hilarious!

May 21, 2026

Small affordance in the alarms section on iOS. When you modify an alarm it is also turned on. My engineer brain would say those are 2 separate things, but you only edit an alarm to set it. Good design doesn’t fall prey to my engineer brain.

May 21, 2026

Someone please make the contacts app on MacOS not dog slow (im on MacOS 26 but i think sequouia had the same issue)

May 21, 2026

something about LLM apis costing money (even though it's just a little bit) gets stuck in my craw when using them in tools. I think we will get over this.

For instance, I want to select titles of my thoughts from the contents (since a deterministic approach like taking the first few words doesn't always work). I feel hesitant to put an llm to work even though it would cost me pennies.

we thinking strangely when it comes to money!

May 21, 2026

"Technology isn't the same thing as science at all. And trying lots of different ways to do something isn't the same as experimenting to figure out the rules." There were plenty of people who'd tried to invent flying machines by trying out lots of things-with-wings, but only the Wright Brothers had built a wind tunnel to measure lift..."

Harry Potter and the Methods of Rationality

May 21, 2026

testing hypotheses in science

The goal is not to prove or disprove a hypothesis but gain maximum information about it. This could mean positive or negative tesing

(and I guess we have some threshold or randomization factor to determine how to act once we gain information)

Further thinking

## testing over the entire hypothesis space In

May 21, 2026

testing over the entire hypothesis space

In reality their is a set of hypotheses

you cannot enumerate all of them (there is always some $H_{other}$ )
any observation allocates probability to each of them

https://claude.ai/share/3062a9c1-48de-4919-a0fd-a906d7cf5453

Context for this

## testing hypotheses in science The goal is

May 21, 2026

The Bayesian VC

One would assume that many VCs would make use of Bayesian reasoning to understand the world. After all, they are in an environment with lots of signals that they must apply to companies. Being rash and saying "X failed in the past, therefore I will put 0% allocation into X" is a failure mode; therefore doing a bayesian update on P(X succeeds) seems prudent.

Somehow, I doubt many VCs do this, which is not a shot at VCs; it could be that it is not useful for a number of reasons. The one that I can think of is that most of the signals simply don't matter; as many are fond of saying, it's more about the founders.

However, given that VCs are very actively reacting to changes in the world, I don't think this is true.

May 21, 2026

The codex computer use model is incredibly elegant:

separate computer use mcp which also has all the UI that's specific to computer use (e.g., the cursor and the menubar icon)
integrate that mcp as just another skill

May 21, 2026

The ideal situation for a small team (2-3 people) is if 1 person becomes more vocal, then the other people do too. Lean into it.

Often it goes the opposite way, the other person shrinks (I personally have done this)

May 21, 2026

the isolation effect (von resteroff effect) for marketing

When you have a list of options, any option that has something different stands out even if the signal itself is weak

We have a stand-out bias - which can be exploited to make things stand out

May 21, 2026

"There are a few ways in which a man can be more innocently employed than in getting money"

Samuel Johnson (found via the Money Stuff Podcast ep on Jan 2, 2026)

May 21, 2026

Timeboxing: Parkinson's Law + Planning Fallacy

Parkinson's Law: Things take as long as you have to do them ("Deadlines are the condition of productivity")

Timeboxing: set a fixed period of time to work on some activity and stopping unconditionally

Why it works

Parkinson's law implies that setting a shorter time period may allow us to compress the schenanigans
You make progress on something, without the anxiety of completing it (which is known to be a hard problem to estimate, according to the planning fallacy)

Context for this

# Combatting the planning fallacy 1. If you

Further thinking

# Combatting the planning fallacy 1. If you

May 21, 2026

To build products that are as good as Apple in its heyday means you have to build the products first and then wait 10 years for opinion to catch up; being patient is crucial

May 21, 2026

We are always like fish - the water we are in imposes optimization criteria and constraints we can't detect.

For instance, in startups, maybe one implicit criteria is getting wealth or revenue as fast as possible. This is good signal, but it may not actually be algined the goal we set for ourself (e.g., to build a particular kind of thing we wish to see in the world etc)

Further thinking

May 21, 2026

When designing an improvement for a system, you typically don’t want it to get worse. Good strategy:

figure out an error rate for the old system
compute the agreement between the new system + the old system
if the agreement is significantly less than the error rate of the old approach you have a problem

May 21, 2026

"You tell me whar a man gits his corn pone, en I'll tell you what his 'pinions is."

Part of this means our incentives shape our constraints. (there is more too it - Twain also says all opinions originate in this way. See: https://paulgraham.com/cornpone.html for a more complete excerpt)

May 21, 2026

Listening to “Human Resources” by Dan Carlin, about slavery.

One is struck how much people were able to shape their opinions to their monetary incentives (eg, revolutionary France controlling Saint-Domingue, now Haiti)

It’s worth listening too because we are under informed about slavery, since it is avoided in media besides to make a current political point

https://podcasts.apple.com/us/podcast/dan-carlins-hardcore-history/id173001861?i=1000553133741

January 27, 2026

Small accordance in the alarms section on iOS. When you modify an alarm it is also turned on. My engineer brain would say those are 2 separate things, but you only edit an alarm to set it. Good design doesn’t fall prey to my engineer brain.

January 26, 2026