Episode 2

full
Published on:

30th Jun 2026

AI Coding Is Solved. Software Engineering Isn’t.

If coding is "solved," is software engineering? Ajay Medury (software engineer) and Andrew Sierota (systems engineer) pick up where Episode 1 left off and get into the part that isn't solved: judgment. They trade notes on why weekly usage limits have quietly become the real project budget, what it's like to build a sharded Minecraft world solo as both product manager and principal engineer, what Amazon's New World got wrong about scale, running decorrelated multi-model code reviews, and what an AI "skill" actually is. It might all just be an act of intelligence.

In this episode:

  • Why "coding is solved" but software engineering isn't, and why judgment is the expensive part
  • The new bottleneck: weekly subscription usage limits as a hard budget
  • Breaking a big build into modules and submodules, and shrinking scope to actually ship
  • Wearing every hat at once: product manager and principal engineer
  • New World and the problem of scale at launch
  • Large codebases, heavy test coverage, and review rounds that exposed process gaps
  • Pre-flight checks, linters, and spec-tracing that cut review loops down
  • Decorrelated reviews: several different models reviewing blind, then taking the union of findings
  • What an AI "skill" is: system prompts, user prompts, and guardrails for long workflows
  • Severity tiers for findings: blockers, warnings, defers, suggestions, and nits
  • Why systems admins, as generalists by trade, may be an ideal audience for these tools

Chapters:

  • (00:00:00) - Real intelligence, or just an act?
  • (00:01:02) - Coding is solved; software engineering isn't
  • (00:02:02) - Keeping up with the release pace
  • (00:05:27) - Vibe coding vs. a repeatable process
  • (00:06:40) - Usage limits are the new budget
  • (00:08:40) - Breaking the build into modules
  • (00:12:18) - A sharded world, every hat on one builder
  • (00:17:12) - New World and the problem of scale
  • (00:25:04) - 200k lines and a 90-round review
  • (00:27:20) - Pre-flight checks that cut 90 rounds to 5
  • (00:30:49) - The podcast's own local-GPU pipeline
  • (00:33:27) - Learning by asking "what do you mean?"
  • (00:35:35) - When a large agent run burned through the budget
  • (00:38:05) - What is a skill, really?
  • (00:48:05) - Skills as guardrails for long workflows
  • (00:51:09) - Severity tiers: blockers to nits
  • (00:54:34) - Why sysadmins are ideal builders
  • (00:56:29) - A long-running Minecraft community, the real driver
  • (01:00:56) - Closing: an act of intelligence
Transcript
Speaker:

Hello, my name is Ajay Medury, and I'm a software engineer, and today I'm

Speaker:

joined by...

Speaker:

My name is Andrew Sierota, and I'm a systems admin.

Speaker:

Awesome, and today we are here to talk about various topics

Speaker:

in the AI ML space for a podcast that we have coined

Speaker:

Active Intelligence, because we're trying to figure out if it's real intelligence or

Speaker:

is it just acting?

Speaker:

And this podcast is for all aspiring creators, creatives, and

Speaker:

builders.

Speaker:

Or those who have already been doing it for a while and just are looking for new

Speaker:

tools and maybe ways to improve their workflows.

Speaker:

I'm curious if, you know, one of the things we talked about last time was

Speaker:

particularly like, you know, software engineering, writing code might be a solved

Speaker:

problem, but is software engineering the solved problem?

Speaker:

And I think, yeah, I was curious.

Speaker:

Yeah, yeah, picking up where we left off on the last episode there, I remember us

Speaker:

saying that coding could be solved, right, but software engineering definitely

Speaker:

isn't, and I think I was touching on this while we were chatting just before the

Speaker:

show, you know, coding is basically, you know,

Speaker:

Claude can write code all day long, faster than anyone can

Speaker:

humanly.

Speaker:

Mm-hmm.

Speaker:

It'll figure it out if you give it enough time, enough tokens.

Speaker:

Oh, yeah.

Speaker:

But judgment isn't free.

Speaker:

Yes.

Speaker:

And that's where humans are still very valuable, is judgment.

Speaker:

And that can be really expensive.

Speaker:

It could be a really expensive mistake if you have poor judgment on the usage of

Speaker:

your code.

Speaker:

And talking about that in particular, here's like bad judgment in terms of

Speaker:

accidentally putting a vulnerability out there that could now all of a sudden be

Speaker:

discovered by models much more easily.

Speaker:

The cost of that is pretty intense.

Speaker:

Yeah, yeah.

Speaker:

And I don't remember what the, that, there was the project that Anthropic did with

Speaker:

like the 30 big companies.

Speaker:

Yes.

Speaker:

To like the pre-release of Mythos.

Speaker:

Yeah.

Speaker:

And they were supposed to like patch everything, supposedly, before they released

Speaker:

Fable.

Speaker:

Yes.

Speaker:

Right.

Speaker:

That's funny, 4.8, Opus 4.8 was only released like two weeks ago.

Speaker:

That's.

Speaker:

And then less than two weeks later, we have Fable now.

Speaker:

Three days later, we don't have Fable.

Speaker:

Which is really interesting to me because traditional software engineering took a

Speaker:

lot more time, had a lot more rituals potentially.

Speaker:

And, you know, again, we kind of broached the subject last time is, were those

Speaker:

rituals still meaningful?

Speaker:

Like, do those still make sense to do today?

Speaker:

Like, because I can't even imagine a time like four or five years ago where you'd be

Speaker:

able to release a, you know, pretty significant version and then release the next

Speaker:

major version within weeks later.

Speaker:

I think you'd be waiting months between these kind of releases.

Speaker:

So.

Speaker:

Yeah.

Speaker:

Kind of, kind of hard to keep up with, to be honest.

Speaker:

You know, there's, there's so much changing so quickly.

Speaker:

By the time this is released, you know, who knows what would have changed.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I think that a lot of what we're doing now, and I guess

Speaker:

something that I've done, like, obviously the world's changing every day.

Speaker:

There's new tools, you know.

Speaker:

It's hard to, you know, want to use the latest and greatest all the time.

Speaker:

Yes.

Speaker:

And, but also keep adding new requirements.

Speaker:

To your project, right?

Speaker:

Yes.

Speaker:

Because, like, there was a point in time where in the, my Minecraft project right

Speaker:

now, I was just adding so much stuff because I was like, oh, yeah, I like this thing

Speaker:

can code everything for me.

Speaker:

Oh, yeah.

Speaker:

That's no longer a limit, but then once I started to get to, well, is it going to

Speaker:

work, right, there's just too many things to check.

Speaker:

Yeah.

Speaker:

And, and I think I remember last episode, I said, I think, you know, testing would

Speaker:

be 10 times as much as the production.

Speaker:

Yeah.

Speaker:

I'm actually thinking it's going to be 100 times more than the development now.

Speaker:

Yeah.

Speaker:

The realization is now dawning.

Speaker:

It's like, oh, no, this, there's a lot more.

Speaker:

I gave it the ability to build all this stuff.

Speaker:

The time it will take for me to now validate that, yeah, it just feels.

Speaker:

It's going to be a lot.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And I think that's the interesting part is because back when, back when I was in my

Speaker:

previous company, we were a cloud company.

Speaker:

And so we were trying to launch systems for others to use.

Speaker:

The expectations there were always like.

Speaker:

Let's maybe be a little more restricted because if we can restrict the size of the

Speaker:

system that we're like actually shipping out there, we can maybe do a better job of

Speaker:

building it and reduce things like risk, which is I'm imagining a question that

Speaker:

Anthropic is asking right now is what is the risk of actually releasing this model?

Speaker:

So it becomes a similar question.

Speaker:

I think we had, we had those kinds of conversations all the time where it's like,

Speaker:

all right, do we actually do less intentionally?

Speaker:

And the answer at times was yes.

Speaker:

Yeah.

Speaker:

We should do less intentionally.

Speaker:

I don't also think that always applies towards other kinds of systems like our

Speaker:

projects, right?

Speaker:

I do feel like we were briefly talking about this before the podcast about like, oh,

Speaker:

what is, you know, like, what does software engineering look like versus vibe

Speaker:

coding?

Speaker:

And there is definitely a good number of things I can bring up when I start talking

Speaker:

about it.

Speaker:

Though I do want to say like one distinction we kind of were, maybe we liked the

Speaker:

idea of it is software engineering.

Speaker:

Like in the big tech companies, like a well-oiled process, it's a repeatable thing

Speaker:

that they keep repeating to keep, you know, churning out new features, new products

Speaker:

and so on and so forth.

Speaker:

However, I do think that the vibe coding and more like, you know, building

Speaker:

locally, building a system today, like a lot of engineers who do it outside of the

Speaker:

big tech, I feel like it's more like a project where the project is something you

Speaker:

kind of figure out how to execute the project as you go along.

Speaker:

There isn't just an answer for every single thing.

Speaker:

Like you don't get told like, oh, this is where you, you know, release the code,

Speaker:

this is where you talk to next.

Speaker:

You know, you don't go step one, step two, step three with the actual like new world

Speaker:

of building systems, building like projects.

Speaker:

I do feel like there's a lot more variance.

Speaker:

And Andrew, it sounds like, sounds like you're kind of experiencing some of that,

Speaker:

right?

Speaker:

If I'm getting, if I'm getting it right.

Speaker:

Well, absolutely.

Speaker:

And it's funny, like, like in enterprise, you have budgets, you have deadlines, you

Speaker:

have a boss who's breathing down your neck.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

But, but when, when you have your own project, all of a sudden, the bigger

Speaker:

limits is, especially when you're not paying for API, you're paying for a

Speaker:

subscription like ClaudeMax, CodexMax.

Speaker:

The biggest, the next big limit is your usage every week, your usage in every five

Speaker:

hour window.

Speaker:

And that's actually the budget now that I'm working with.

Speaker:

Like, like I have to calculate, like I got a 20 X max subscription for Codex and

Speaker:

Claude.

Speaker:

I can use that up a weekly limit in two days.

Speaker:

And I have two subscriptions.

Speaker:

So I can code for four days a week.

Speaker:

That's your budget.

Speaker:

Yeah.

Speaker:

That's my budget.

Speaker:

And, and then I have to think, well, that's just testing, reviewing the code.

Speaker:

And I haven't even really reached the stage where I'm doing practical tests.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Real, real world.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And so that's why I said, I'm like, wow, even with this process, I'm going to be

Speaker:

able to do it.

Speaker:

But as I'm practically fully automated now, it's still going to take a lot of time

Speaker:

because budgets aren't infinite for most projects.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I think that's the idea of like within this budget for this project, how can I

Speaker:

actually figure out to execute the thing that I care about?

Speaker:

And I think that's the process of like, oh, wow.

Speaker:

I'm saying process project over and over again.

Speaker:

Maybe I'll say like, there is a self-reflection that needs to happen in projects to

Speaker:

feel like, Hey, what is, what is done?

Speaker:

done.

Speaker:

Like when do I actually, as you said, like earlier in your project, you're actually

Speaker:

like do more.

Speaker:

And eventually you started to realize like, okay, this might be a little too much

Speaker:

because then the amount of stuff that I can validate and actually make sure the

Speaker:

quality is good might be growing so quickly.

Speaker:

Then you have to take a judgment call, which actually lets you decide, or then you

Speaker:

have to, the AI won't do this for you.

Speaker:

So as a builder, you need to take a judgment call, say that, no, we're going to stop

Speaker:

here.

Speaker:

We're going to actually now figure out the validation.

Speaker:

We're going to start figuring out if everything's working.

Speaker:

At least I think that's the way I'm thinking about it.

Speaker:

I don't know if you feel a similar, like, do you feel like that's the process?

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

I, to go along with that for me, how I've kind of thought about it is, you know, I

Speaker:

broke the project down into phases, right?

Speaker:

And I was like, okay, we need, what can we start out with?

Speaker:

You know, an MVP, right?

Speaker:

Like what can we start out with?

Speaker:

Even that is pretty big in scale.

Speaker:

But at least that once that's done, there's a foundation for the later updates.

Speaker:

Right.

Speaker:

Um, and so right now I think that's why this first phase, even though I've already

Speaker:

reduced the scope, like I have like 16 planned modules, I'm only coding like, you

Speaker:

know, eight of them.

Speaker:

And one of them is a really big commons module.

Speaker:

Yeah.

Speaker:

But the thing is that they're being coded, but a lot of them are just skeletons for

Speaker:

the next phases.

Speaker:

I mean, I think, uh, I may say eight modules, eight modules.

Speaker:

Yeah.

Speaker:

I actually feel like that's a good thing.

Speaker:

Usually being able to break it down into smaller pieces.

Speaker:

So you actually then go and if something breaks, you want to ideally be able to

Speaker:

focus on one module.

Speaker:

Oh, absolutely.

Speaker:

Absolutely.

Speaker:

And do you feel like that's kind of the, it has worked pretty effectively?

Speaker:

So, so one of the modules that I have, um, has been broken down into like five sub

Speaker:

modules.

Speaker:

All right.

Speaker:

Okay.

Speaker:

Cause, cause there were, there were big enough elements individually

Speaker:

that I thought, you know, we need to break this down a little bit more, but for the

Speaker:

sake of the project planning, those modules are now instead of oh four, oh four, a,

Speaker:

b, c.

Speaker:

I don't want to go ahead and change all the other module numbers to squeeze those

Speaker:

in.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

You don't want to renumber everything.

Speaker:

You just want to let it be a submodule of the existing.

Speaker:

Exactly.

Speaker:

So those are just submodules now.

Speaker:

Um, at the end of the day, um, it's all about, can we maintain it?

Speaker:

Um, yeah.

Speaker:

And I think interestingly, what you're experiencing is a, there, there's a little

Speaker:

bit of a mirror.

Speaker:

There's like the other side of the coin.

Speaker:

If you look at software engineering, traditionally, I think we were saying there's

Speaker:

processes that once one follows, uh, and the interesting thing is like what I've

Speaker:

experienced is when you're trying to actually build something, build something new,

Speaker:

usually you're getting requirements from somebody.

Speaker:

You're actually getting requirements from a product manager or from leadership

Speaker:

saying that, Hey, we've identified, uh, this opportunity in the market.

Speaker:

Can you go scope it out?

Speaker:

Can we actually go figure out the process actually entails that, right?

Speaker:

Like the process says, uh, leadership identified an opportunity product manager.

Speaker:

Now it goes and figures out what that opportunity is like, how big is it?

Speaker:

How much scope is it?

Speaker:

How many things need to be built out to capture that opportunity?

Speaker:

And then the software engineering folks get pulled in the senior, you know,

Speaker:

principal folks get pulled in, uh, where they're like, okay, what modules do we need

Speaker:

to actually make this?

Speaker:

Oh, do we need new products?

Speaker:

Do we need eight modules?

Speaker:

Do we need three?

Speaker:

Like, is it one good enough?

Speaker:

Uh, and interestingly, uh, you know, there's a lot of

Speaker:

things we can do with confused systems.

Speaker:

There's a lot of things we can do with skilled systems.

Speaker:

And even with, uh, you know, like the, uh,

Speaker:

the, the, the, the, the applications that we, that we wanted to very, very much, we

Speaker:

didn't want to, we wanted to shift that to the, to the, uh,

Speaker:

you know, the, and the, the second one, we were, we were really thinking

Speaker:

about how we can start working with new things.

Speaker:

All of these things are very different than what it would take traditionally the big

Speaker:

tech companies to do, right?

Speaker:

Because they need individuals to write the code in the past.

Speaker:

Maybe not anymore.

Speaker:

Yeah, yeah.

Speaker:

Hmm.

Speaker:

I think, you know, as I've been letting Claude code it, planning the

Speaker:

project out, having the spec sheets and everything like that, I begin to realize

Speaker:

that I kind of wish I actually shrunk it even more.

Speaker:

Shrunk it even more?

Speaker:

Like a lot more, actually.

Speaker:

A lot more.

Speaker:

Because right now I have, like, if we, I'm not sure we talked about, like, the

Speaker:

architecture of it all, but there's going to be, it's going to be a sharded system.

Speaker:

We're going to have nine separate worlds.

Speaker:

They're each going to be very smoothly transitions for players to go between them.

Speaker:

But that requires a lot of extra networking code on top of Minecraft already.

Speaker:

And I was thinking, I could have actually just done just one server.

Speaker:

Single host, yeah.

Speaker:

Single host.

Speaker:

Exactly.

Speaker:

Ignored all the extra networking stuff.

Speaker:

And, you know, I'd probably already be in game testing right now.

Speaker:

Mm-hmm.

Speaker:

Because I think that that extra layer, on top of all the other things that I wanted,

Speaker:

Yes.

Speaker:

is actually, that's the complicated part that Claude is spending a lot of time on.

Speaker:

Yeah.

Speaker:

There's a lot of gotchas that, on the first pass, it's not going to catch.

Speaker:

Yes.

Speaker:

And I thought about it.

Speaker:

I was like, maybe I should.

Speaker:

But, you know, I've already spent so much.

Speaker:

So you are the product manager and the principal engineer, like, dealing with this

Speaker:

at the same time.

Speaker:

Yes, and I'm like, okay, well, it's going to be worth it when it works, but I will

Speaker:

get back to you on when it works.

Speaker:

And this is where I do feel like the traditional software engineering and big tech

Speaker:

would have been like, oh, no, we failed.

Speaker:

Because the process should have already caught this at some point.

Speaker:

So I feel like that's the interesting differentiation I see right now.

Speaker:

Which is not always to say it was a good thing.

Speaker:

Because I do feel like going through this process of, like, or the project approach

Speaker:

of, like, let's go start, let's just see what we can build, and, you know, let's

Speaker:

make it unrestricted, right?

Speaker:

We're going to learn a lot more.

Speaker:

And I think that means we're actually going to figure out things more individually.

Speaker:

And I'm curious if you feel like doing the project in the way that you have done it

Speaker:

so far has actually taught you a lot more of what you would avoid next time.

Speaker:

And because you are mentioning that maybe you should have started smaller and, like,

Speaker:

started more restrictive.

Speaker:

And I'm curious if you feel like doing the project in the way that you have done it

Speaker:

so far has actually taught you a lot more of what you would avoid next time.

Speaker:

There's, like, history in software engineering that kind of points us to some of

Speaker:

this stuff, right?

Speaker:

And even then, a lot of software engineers don't actually follow it.

Speaker:

A lot of companies don't actually follow it.

Speaker:

There's no guarantee it's going to actually be repeatable and working.

Speaker:

So individual builders now have the same similar power.

Speaker:

And I'm curious, are you, like, do you feel like this is one of those moments where

Speaker:

you're like, all right, I'm going to go ahead and let this run as it does.

Speaker:

But next time, I'm actually going to do single host or try and figure out what

Speaker:

single host looks like.

Speaker:

Yeah, I have a couple ideas of new projects, I suppose.

Speaker:

And I thought about it a little bit.

Speaker:

I do think that the stuff I'm doing now is informing future

Speaker:

projects.

Speaker:

I definitely would have done it a lot differently.

Speaker:

And another part of it is I constantly do change what I do

Speaker:

all the time.

Speaker:

Like, half of the time.

Speaker:

Half of my time is spent actually optimizing the workflow, thinking about where can

Speaker:

we cut costs?

Speaker:

Where can I cut usage?

Speaker:

For example, I have directed Opus to actually use

Speaker:

Sonnet to implement fixes now.

Speaker:

Yes.

Speaker:

Because, I mean, Opus will write the, you know, the plan and then Sonnet's pretty,

Speaker:

pretty good at implementing it.

Speaker:

And at the end of the day, Opus will still review the changes.

Speaker:

So I'm still gaining it behind the front-tier model.

Speaker:

And it's not just Opus, right?

Speaker:

If I'm not mistaken.

Speaker:

Yeah.

Speaker:

Right.

Speaker:

And it's not just Opus.

Speaker:

Yes.

Speaker:

There's other reviewers.

Speaker:

I have GPT 5.5 as well.

Speaker:

DeepSeq.

Speaker:

It's very cheap.

Speaker:

It's very cheap.

Speaker:

It's very cheap.

Speaker:

You're welcome, China.

Speaker:

But, yeah, I think the decorrelated reviews have saved a lot of money,

Speaker:

actually, because I'm having three different models review the code at the same

Speaker:

time.

Speaker:

And there's a lot of use.

Speaker:

There's a lot of unions in their findings, and there's a lot of findings that they

Speaker:

individually would not have picked up.

Speaker:

They're all trained on different data, and that gives you three different

Speaker:

perspectives.

Speaker:

Perspectives, yeah, yeah, I love that.

Speaker:

And I think that's really important when you're trying to build a system that's

Speaker:

robust, and that's actually what I'm trying to do with Minecraft.

Speaker:

Like, the technology behind what's going on here is intentional to be robust,

Speaker:

because there is a lot of different communities that have done similar things.

Speaker:

But not to this scale, and not, like, even

Speaker:

normal MMOs have failed at doing this.

Speaker:

Actually being able to scale out.

Speaker:

Yeah, actually being able to scale and to, like, allow, like, seamless gameplay.

Speaker:

It remains to be seen if I can accomplish this myself, because I get a little

Speaker:

concerned, thinking, like, why hasn't, you know...

Speaker:

I was going to say, for our listeners, can you maybe, like, do you have a specific

Speaker:

thing that you can talk about where the scale was required?

Speaker:

Yes, yes, what was that game?

Speaker:

Amazon Game Studios, we played it a little bit.

Speaker:

Oh, yes, yes, New World, New World.

Speaker:

New World, and see, that was the game that I thought was, I thought Amazon was going

Speaker:

to solve this problem.

Speaker:

And when you say the solve this problem, which problem, if I may say, the problem in

Speaker:

particular.

Speaker:

The problem of scale.

Speaker:

The problem of when your game releases, you have millions of players that arrive,

Speaker:

and all of a sudden you have a 30,000 player queue.

Speaker:

Yes, yes, yes, yes, okay, yeah, yeah.

Speaker:

And not only that, it's not one big world.

Speaker:

There's hundreds of worlds that have lines.

Speaker:

Yes.

Speaker:

They're crashing left and right.

Speaker:

And I was like, come on, AWS, Amazon, had to have had the resources and know-how

Speaker:

to do this.

Speaker:

But yet, they made the same mistake as every other predecessor before them.

Speaker:

And I think that's the really interesting part, is, like, I'm also curious, like, if

Speaker:

I were to go back and ask them that question, what part broke, right?

Speaker:

Like, what was it the fact that now you would have to...

Speaker:

Just have n number of players on the map at the same time?

Speaker:

Was it that they're trying to communicate with each other over voice or something?

Speaker:

And that was essentially what was causing the breakage?

Speaker:

I'm curious of, like, what was their bottleneck?

Speaker:

Because, sorry, yeah, you had something in mind?

Speaker:

I do have something in mind.

Speaker:

Like, I played a lot of different MMOs.

Speaker:

A sharded system is really common, right?

Speaker:

The issue, I think, that Amazon...

Speaker:

That Amazon had with New World was they built the game like it was any other MMO.

Speaker:

They did not take advantage of their expertise.

Speaker:

From the get-go.

Speaker:

Yeah, they built their own game engine, but they didn't do anything unique.

Speaker:

Yeah.

Speaker:

They didn't structure it in a way where they could scale it automatically.

Speaker:

Yes.

Speaker:

Right.

Speaker:

And maybe they did, but it didn't work.

Speaker:

The process failed.

Speaker:

It didn't work.

Speaker:

The risk was not assessed properly.

Speaker:

Yes.

Speaker:

I mean, there were...

Speaker:

We played launch.

Speaker:

We literally could not play for a few days.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

We actually gave up on the weekends.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

We had to wait a few days.

Speaker:

And to me, that's millions of dollars being lost.

Speaker:

Yeah.

Speaker:

I mean, I think New World would be a completely different game.

Speaker:

If it was built with that scale from the get-go.

Speaker:

If it was built properly from the get-go.

Speaker:

Yeah.

Speaker:

And I think that's the interesting, like, trade-off there as well.

Speaker:

Like, this is...

Speaker:

My understanding is also, like, traditional software engineering would also ask you

Speaker:

that question is, do you know if you need that scale?

Speaker:

Like, do you know if the marketing has been effective enough?

Speaker:

And do you know if the product...

Speaker:

Does the product manager actually...

Speaker:

Have they talked to the marketing department and seen an insane amount of, like,

Speaker:

interest?

Speaker:

And have they been able to calculate the amount of interest to then inform the scale

Speaker:

decision?

Speaker:

Because it is very common for software engineering teams during this process of

Speaker:

building the product to ask this question of, like, hey, do we want to be scalable

Speaker:

to, like, two million players on day one?

Speaker:

Or do we want to be scalable, you know, like, one world will be scalable to, like,

Speaker:

100,000 users at a time, that kind of thing.

Speaker:

And they make these decisions so that they can, you know, like, punt some of the

Speaker:

very complicated, very difficult things, in this case being, like, instead of

Speaker:

splitting this module into six different sub-modules, module three into six

Speaker:

different sub-modules, I'm just going to make module three into two sub-modules for

Speaker:

today.

Speaker:

And that'll satisfy my needs for now.

Speaker:

In this case, it feels like that process was a failure because somewhere, somehow

Speaker:

they didn't understand that the demand was so high and one of the core values as

Speaker:

gamers, as, like, people who enjoy playing games, waiting to get in to play your

Speaker:

game is a game-breaking experience.

Speaker:

Especially when you're super excited and you pre-ordered the game.

Speaker:

Yeah, and you paid extra.

Speaker:

Yeah, you paid extra.

Speaker:

And it's such a shame because I actually did, like, once we finally did play.

Speaker:

Yeah, it was fun.

Speaker:

It was fun.

Speaker:

But the game quickly died off

Speaker:

because, I mean, you had millions of players who could not play.

Speaker:

Yes.

Speaker:

Day one.

Speaker:

And then sometimes your friends would end up on the other server or the other world.

Speaker:

Yeah.

Speaker:

And they couldn't come and join you.

Speaker:

So all of a sudden, one of the main reasons I play games is a social, like, it's a

Speaker:

social thing for me.

Speaker:

I want to play with other people.

Speaker:

I want to play with my friends.

Speaker:

And if I can't play with my friends, I'm going to find a much more difficult return.

Speaker:

Absolutely.

Speaker:

So I do genuinely, like, question.

Speaker:

That's the traditional software engineering method.

Speaker:

So, like, waits for such a long span because the cost of solving these problems tend

Speaker:

to be, again, you're making commitments to your boss.

Speaker:

Yeah.

Speaker:

You're answering to leadership.

Speaker:

You're saying that, all right, leadership is saying that, OK, you have a budget of

Speaker:

these many people for these many weeks.

Speaker:

And if you can get the game released in those weeks, great.

Speaker:

Otherwise, we're going to maybe, like, go a few, you know, not give you a promotion,

Speaker:

whatever it is, right?

Speaker:

The value proposition is so different versus I have seen indie games that have been

Speaker:

so successful and they haven't.

Speaker:

But I also do understand that they don't have that same pressure of, like, you know,

Speaker:

corporate top down of, like, you need to release this soon, they will release it

Speaker:

when they want.

Speaker:

So I'm actually also curious.

Speaker:

Do you feel like do you feel pressure to release your project?

Speaker:

So thankfully, I actually have not released public information on this yet.

Speaker:

Oh, so so if someone finds this podcast, they will recognize my voice.

Speaker:

Then then they're going to start asking questions immediately.

Speaker:

There there are some people who know that it's coming.

Speaker:

I haven't given any hard timelines myself because this is,

Speaker:

you know, a project that I'm figuring out as I'm going along.

Speaker:

Absolutely.

Speaker:

But there is pressure because I think I set up a lot of my own

Speaker:

deadlines in my head.

Speaker:

Right.

Speaker:

There's I have a lot of expectations of where I should be by a certain time.

Speaker:

And that's part of the workflow.

Speaker:

And I'm like, OK, I'm going to trim this.

Speaker:

I'm going to you know, I'm going to accept that, you know, a lot of these things

Speaker:

aren't exactly as I want them, but I'm going to leave it and we're going to move on

Speaker:

and just try to get this working right now.

Speaker:

And I actually think I'm trying to get to in game testing as fast as possible,

Speaker:

because like you were saying, like before the podcast, the practical testing can

Speaker:

serve way it's way more efficient than just, you know, traditional

Speaker:

tests.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And I think that's a really interesting topic.

Speaker:

Actually, we talked about we will jump into that a little more after.

Speaker:

I do want to ask.

Speaker:

But you don't feel at this point you don't feel like you would compromise on certain

Speaker:

aspects, though, despite the time pressure, there are certain things that you are

Speaker:

very much like this is a critical thing based on your experience.

Speaker:

Yes.

Speaker:

Yeah.

Speaker:

Based on my experience, there's a lot of things at this point that I'm I'm holding

Speaker:

on to, which is really interesting to me, because when you talk about big

Speaker:

corporations and Amazon releasing it, the distance between the person who actually

Speaker:

understands the experience and the person building the experience is actually non

Speaker:

-trivial.

Speaker:

So I do feel like in big tech or in generally like big software organizations, that

Speaker:

is something that is I I'm really excited about the, you know, like coding tools and

Speaker:

all of these things becoming much more democratized, because the person who actually

Speaker:

understands the most about the experience now can actually ask direct questions

Speaker:

about, like, which parts of the experience are actually going to be implemented

Speaker:

versus not.

Speaker:

And I feel like I made the joke by you being the PM and the, you know, sometimes

Speaker:

that's I do feel like that that's a good thing, because I actually feel like you're

Speaker:

able to not only understand, but ask questions and actually also get into focus in

Speaker:

the right way.

Speaker:

Absolutely.

Speaker:

Though the testing part is still like an all huge, you know, open box.

Speaker:

And I guess I'm curious.

Speaker:

So, like, we were talking about certain number of lines of test versus code.

Speaker:

And do you want to share?

Speaker:

Yeah.

Speaker:

Yes.

Speaker:

I think

Speaker:

Project now has approximately two hundred thousand lines of code.

Speaker:

And it's about it's not exactly fifty fifty, but it's a little bit.

Speaker:

It's about fifty fifty.

Speaker:

It's going to be by the end of it.

Speaker:

I also meant the two hundred thousand lines is non-trivial.

Speaker:

Yeah, that that is a lot.

Speaker:

And I'm clearly not, you know, hand reviewing any of this.

Speaker:

But there is plenty of standards and conventions that we talked about last time that

Speaker:

are being taken into consideration during the iterative review rounds.

Speaker:

And actually to speak on that, give a little bit more since I've actually interfaced

Speaker:

with the project since then, quite a bit.

Speaker:

One module took 90 review rounds to converge into

Speaker:

what I at a certain point I actually had started to strip away requirements for the

Speaker:

testing.

Speaker:

Is that the module that you broke up or is that the module?

Speaker:

That's the module that I wrote.

Speaker:

OK, OK.

Speaker:

That explains all.

Speaker:

That does explain all.

Speaker:

Yeah.

Speaker:

It did finally convert.

Speaker:

Into where it was mostly like comments and code were just not consistent with

Speaker:

previous changes and at a certain point, I'm like, OK, we're going to keep finding

Speaker:

things forever.

Speaker:

Yeah.

Speaker:

And so I'm like, I think this is a good place to stop.

Speaker:

Yes.

Speaker:

And actually, since I actually went through a review review process

Speaker:

with the review cycle.

Speaker:

Yeah.

Speaker:

With Claude.

Speaker:

And I said, OK, let's take a look.

Speaker:

I had it log all 90 rounds.

Speaker:

You reflected on the review process of like this 90 iterations.

Speaker:

OK, well, OK, yeah.

Speaker:

Tell us more.

Speaker:

I had it.

Speaker:

I had it, you know, from the get go log all 90 rounds.

Speaker:

It logged everything from all the different the three models that I used to review

Speaker:

things.

Speaker:

Really good space to do a reflection on 90 rounds is a lot.

Speaker:

I was like, I was like, I burned a lot.

Speaker:

Like

Speaker:

and so it came back and looked through everything and it gave me.

Speaker:

So I was like, what?

Speaker:

I asked it.

Speaker:

You know, simple English.

Speaker:

Yeah.

Speaker:

What can we do to lower the amount of rounds?

Speaker:

Give me the executive review.

Speaker:

Yeah.

Speaker:

And it came back with a lot.

Speaker:

I'm not exactly familiar with maybe all the terminology specifically.

Speaker:

But there were things like I remember it saying, like, it will add it'll do like a

Speaker:

pre-flight check.

Speaker:

It will, like, trace all the methods and classes ahead of time against the specs.

Speaker:

It will, you know, it will have an index of what it needs to look for.

Speaker:

It added a few linters.

Speaker:

OK, yeah.

Speaker:

Yeah.

Speaker:

That's good.

Speaker:

It did.

Speaker:

So those are improvements.

Speaker:

Yeah, yeah.

Speaker:

Yeah.

Speaker:

And it's actually interesting.

Speaker:

The next model on the next module that it reviewed only took five rounds.

Speaker:

That 90 to five.

Speaker:

That's that's pretty.

Speaker:

That's pretty.

Speaker:

OK, I must also play devil's advocate and ask how big was the other one?

Speaker:

The it's I would say that they were similar, similar, similar.

Speaker:

But but here's the thing.

Speaker:

Here's the thing.

Speaker:

There's another reason why.

Speaker:

Yeah, because of the a lot of the rounds were

Speaker:

finding things that it should have picked up the first time.

Speaker:

OK, OK.

Speaker:

Right.

Speaker:

And that's where those things like tracing all the tracing the comments back.

Speaker:

Yeah, the linters, all of these things really did reduce a lot of the

Speaker:

noise.

Speaker:

So interestingly, I recently read this as well as like I was going through Claude's

Speaker:

has updated documentation online.

Speaker:

They actually I think a while ago put this user guide and I think it's pretty

Speaker:

buried.

Speaker:

Unfortunately, I do feel like it's a little buried.

Speaker:

One of their strong recommendations is plan first always.

Speaker:

But even before planning, you should actually ask it to understand research what

Speaker:

this module is doing or what this code looks like.

Speaker:

How does it actually trace down a particular feature?

Speaker:

So it's like, OK, how does your authentication flow work?

Speaker:

Like would be a good question.

Speaker:

Right.

Speaker:

And that actually does pre-work.

Speaker:

It says that, all right, capture all the information about the authentication flow,

Speaker:

because then I know exactly where I need to make the updates.

Speaker:

Sounds like that's you've experienced that firsthand now.

Speaker:

Yes, yes.

Speaker:

And like I was like last show,

Speaker:

there's a lot of things I'm finding out by brute force, like I'm developing the

Speaker:

process that, you know, a software engineer would have known.

Speaker:

Yeah.

Speaker:

Or would have been would have been instructed to do more than even know, like when

Speaker:

told, turn your brain off.

Speaker:

Just follow the process.

Speaker:

Yeah.

Speaker:

Which is I do feel like that's that's actually a very interesting distinction I want

Speaker:

to get back into later is

Speaker:

you're learning the reason why the judgment exists or the process exists.

Speaker:

You're using your judgment and then getting Claude to give you the right information

Speaker:

so that you can take the correct judgments that, you know, like probably engineering

Speaker:

teams have been doing ad nauseum across time.

Speaker:

And that ends up becoming either tribal knowledge or becomes very strict process.

Speaker:

It's what it feels like to me.

Speaker:

Like, that's that's what I'm hearing almost.

Speaker:

I'm curious how many more like do you feel like we also talked about using existing

Speaker:

tools, we also talked about like some project or so get you done as a repository

Speaker:

that I keep I keep talking, yes, yes, we'll we'll we'll see if that gets flagged in

Speaker:

some.

Speaker:

That is probably what got flagged on the podcast.

Speaker:

OK, that is probably all right.

Speaker:

So we were trying to syndicate our podcasts across things and the tool's name has

Speaker:

now become a problem.

Speaker:

So I'm sorry.

Speaker:

Andrew, you're going to have to find it.

Speaker:

You're going to have to get OK.

Speaker:

We're going to get flagged again.

Speaker:

Oh, it's an official.

Speaker:

I'll ask Claude to beep it out.

Speaker:

Yeah, yeah.

Speaker:

If it can do it, that would be fantastic.

Speaker:

That would be interesting.

Speaker:

And would also at some point love to learn more about the whole process.

Speaker:

We should be talking about that at some point, too.

Speaker:

Yes, yes, absolutely.

Speaker:

There's a whole process that I've used to normalize the audio to transcribe

Speaker:

per speaker.

Speaker:

That's yes, that's pretty amazing.

Speaker:

The podcast.

Speaker:

I did.

Speaker:

I did read the transcript, at least blurbs here and there.

Speaker:

I was like, OK, it's pretty good.

Speaker:

It was impressive.

Speaker:

And it was done with local compute as well.

Speaker:

Yeah, on GPF.

Speaker:

That's another that's a successful project right there.

Speaker:

Yes.

Speaker:

Anyway, sorry, coming back into this.

Speaker:

Do you feel like there are you're learning a lot of things based on your individual

Speaker:

judgment as well and you're learning a lot about the tool life is what I feel like

Speaker:

is most likely happening.

Speaker:

So maybe I should ask you that question.

Speaker:

You're right here.

Speaker:

Do you feel like.

Speaker:

You're learning a lot more about the tool and the process that you would follow and

Speaker:

building now with the modern AI tooling?

Speaker:

Definitely.

Speaker:

I guess, too, I think that was a really broad question.

Speaker:

It was very broad, I'm sorry.

Speaker:

So I was like, oh, where do I start with this?

Speaker:

I guess you have a little bit more specific to help me nail something down.

Speaker:

Yeah, yeah, definitely.

Speaker:

Definitely.

Speaker:

And without spending too much.

Speaker:

So like we talked about reviews, we talked about basically analyzing the codebase

Speaker:

ahead of time, basically doing research, pre-research ahead of time, then planning

Speaker:

and then executing.

Speaker:

Do you feel like there are more examples where you're like, OK, this is taking way

Speaker:

longer, this was too big, and I think maybe more like it

Speaker:

didn't understand me here clearly and it did all these things as well.

Speaker:

One of the things I think you mentioned to me earlier was you did tell Claude to

Speaker:

exercise his own judgment.

Speaker:

And that is something you're still working on figuring out if that was effective or

Speaker:

not.

Speaker:

Right.

Speaker:

Yes.

Speaker:

So so I guess, yeah, I remember talking about this last show.

Speaker:

Like, there's a lot of things that Claude will just fill in the blanks, fill in the

Speaker:

gaps, especially if you're not specific or intentful with your instructions.

Speaker:

If you say code this, well, it's going to code it.

Speaker:

But is it going to be the way you wanted it?

Speaker:

Is it going to be, you know, is it going to work more than once?

Speaker:

You know, you know, and will you be will you be able to follow its thought process

Speaker:

as well?

Speaker:

Exactly.

Speaker:

Yes.

Speaker:

I think before the podcast, we were talking a little bit about like how you've also

Speaker:

maybe you're using a particular workflow here to actually also learn new things

Speaker:

because it may tell you things and then you're just like, what do you mean?

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I'll speak to that.

Speaker:

Yeah, there's a lot of times where Claude will present an issue to me that popped up

Speaker:

for review.

Speaker:

And I'll read over it and I'll be like, what?

Speaker:

What are you saying?

Speaker:

Yeah.

Speaker:

And so I will literally just ask, like, what do you mean?

Speaker:

Like, could you expand on this a little bit more?

Speaker:

And then after it starts talking, I was like, OK, I'm starting to get the picture.

Speaker:

I ask more specific questions.

Speaker:

Right.

Speaker:

And I keep going and I keep going.

Speaker:

And eventually I have the entire picture and then I'll make a decision.

Speaker:

Yes.

Speaker:

And sometimes by the time I get to there, I'll say, well, we don't need.

Speaker:

We don't need any of this.

Speaker:

Yes.

Speaker:

OK.

Speaker:

And I think that is so this is this is great because this is actually what I wanted

Speaker:

to ask you is how much time did you end up spending doing that?

Speaker:

Because that is a huge value add, right?

Speaker:

Like that is all of a sudden you're like not only learning about the system, you're

Speaker:

also able to then detect it's like this was not a valuable portion of the system,

Speaker:

let's just get rid of it.

Speaker:

That's actually a really good thing.

Speaker:

Honestly, like sometimes you know the tool can do so much.

Speaker:

It does a lot.

Speaker:

And then you all of a sudden you can be like, oh, no, we can simplify.

Speaker:

We should simplify it.

Speaker:

And that is something that takes software engineering teams years potentially before

Speaker:

they realize that the systems they built, there are some portions which are not

Speaker:

useful and not needed and we can get rid of them and then also reduce the amount of

Speaker:

time we take or we spend maintaining them.

Speaker:

And that's cost.

Speaker:

That's again leadership cost.

Speaker:

I'm like curious, like how long now it took you to be able to learn some things

Speaker:

based on like just asking Claude.

Speaker:

Was it days?

Speaker:

Was it hours?

Speaker:

Oh, it's usually minutes.

Speaker:

It's usually all right.

Speaker:

Well, it's usually minutes.

Speaker:

I mean, it when something's costing me time, I

Speaker:

think it's going to be an easy lesson.

Speaker:

That's that's because you are definitely being able to see the, you know, your limit

Speaker:

getting exhausted.

Speaker:

Yeah, that's it.

Speaker:

I don't wait to see the percent change on weekly.

Speaker:

I hit refresh after two, three minutes and well, there's another percent gone.

Speaker:

And you're like, oh, that's that's a heavy.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

It was when I had access to Fable for the three days there

Speaker:

was I actually was not specific on something that Fable

Speaker:

was researching for me.

Speaker:

And it presented me with a couple of options.

Speaker:

And I said, expand on this option.

Speaker:

It launched a hundred agent workflow

Speaker:

to research this and came back with the most worthless response

Speaker:

I've had, probably from Claude.

Speaker:

But it was successful in burning 20 percent of my weekly limit in 30 minutes.

Speaker:

Oh, that's 20x too.

Speaker:

Yeah, on 20x.

Speaker:

Yeah, that was about three point three million tokens for that response.

Speaker:

Oh, wow.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And that would have been what that would have been if it was all output.

Speaker:

That would have been like one hundred and fifty bucks an API.

Speaker:

Wow.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

That's that's a lot.

Speaker:

That's an.

Speaker:

And that is also something that happens often.

Speaker:

But no, that's that's really interesting.

Speaker:

So it moves so fast that it essentially did so much work.

Speaker:

That was not really worth much at the end.

Speaker:

It told you something you mostly probably already knew.

Speaker:

Yeah.

Speaker:

And I was really simple.

Speaker:

I just expand on option two versus like go do a million.

Speaker:

You know, I didn't I didn't say do like a deep deep research report.

Speaker:

I don't know a PhD on this.

Speaker:

Yeah.

Speaker:

I just I just wanted a simple explanation.

Speaker:

And the thing was, I'm running multiple workflows.

Speaker:

I all tabbed came back.

Speaker:

I'm like, word, what happened here?

Speaker:

All right.

Speaker:

OK, so I do want to ask this question now is like in your mind, like, do you feel

Speaker:

like with events like this in general, like you now need to

Speaker:

really incorporate more rituals, more processes, make your project more of a

Speaker:

process if you want to make this into a.

Speaker:

Real life project or sorry, into a real life product or a real life.

Speaker:

Like, do you feel like now when you start running into things like this, your

Speaker:

confidence level has dropped enough that you now need to add things like guardrails

Speaker:

to increase your confidence, because one thing you already said was you did change

Speaker:

the process to do pre-work and that pre-work saved you from 90 to five iterations,

Speaker:

so like nine iterations on a single module to five, which is massive.

Speaker:

Do you feel like now you'd be more interested in looking at opportunities?

Speaker:

To add more guardrails, to slow it down intentionally?

Speaker:

So so that's a great thing to ask, because I actually that's my next kind of

Speaker:

process improvement part of the workflow is that I realized that I'm creating

Speaker:

a lot of workflows for it to follow.

Speaker:

It doesn't always follow them correctly.

Speaker:

Right.

Speaker:

Even if it's literally reading a script.

Speaker:

And then I realized I probably should be using skills and I haven't.

Speaker:

And so

Speaker:

my next step actually is to implement all the workflows that I have been using over

Speaker:

and over and implement them as formal like Claude or Kodak skills.

Speaker:

Yes.

Speaker:

Nice.

Speaker:

I said, OK, I am a little I may have some biased thoughts over there because I've

Speaker:

been using skills for a while.

Speaker:

But before we get into that, could I would you like to explain for audiences in case

Speaker:

like what is a skill versus like what is other prompting in traditional software or

Speaker:

prompting?

Speaker:

So, you know, maybe I you might be able to talk more to this, but

Speaker:

but I'll give you my interpretation of it first, because, again, I haven't actually

Speaker:

made a skill with Claude yet.

Speaker:

So that that was the next thing I'm going to do.

Speaker:

There's a really useful tool from Anthropic to do it.

Speaker:

But this is the skill tree.

Speaker:

The skill creator is so, so creative.

Speaker:

From my understanding.

Speaker:

It

Speaker:

essentially

Speaker:

is like a set of instructions that points Claude to where it needs to find the

Speaker:

information to replicate something

Speaker:

over and over, like a workflow.

Speaker:

Right.

Speaker:

Yes.

Speaker:

There is for me, it's like I don't know exactly how it was that different than

Speaker:

me telling you to read a file.

Speaker:

See, so that that's the question, because that's what I had already.

Speaker:

Like, all I have to do is tell Claude will convert this runbook into a skill.

Speaker:

That's all I'm going to say.

Speaker:

OK.

Speaker:

And then let's see if it works.

Speaker:

Right.

Speaker:

I'm hoping it will because it's it's all right.

Speaker:

All the instructions already there.

Speaker:

That's yes.

Speaker:

I think I think generally you've got the right gist of it.

Speaker:

It is a workflow for Claude to execute.

Speaker:

And in a sense of like instructions that it should know how to like methodically

Speaker:

say, do one, two, three, and you'll get the result that is intended as part of this

Speaker:

skill.

Speaker:

Like, for example, a skill could be like a tax preparation skill could be like, OK,

Speaker:

you know, put together the person's W2 and then put together all the right fields,

Speaker:

fill up the 1099 form or whatever the form is and then submit it.

Speaker:

Right.

Speaker:

Those would be the skill.

Speaker:

The thing I can add that I what I understand why it tends to behave differently than

Speaker:

traditionally, like just saying a prompt and then telling it to go read a file is

Speaker:

there is something the concept of system prompts versus user prompts.

Speaker:

Yeah.

Speaker:

So system prompts are what the AI model is essentially used to give themselves

Speaker:

context or like separate the role from like, oh, this is supposed to be me doing

Speaker:

some work versus this is what the person who's talking to me is asking.

Speaker:

So that person talking to me from a model perspective with the user prompt, whereas

Speaker:

the system prompt is essentially the actual model's own identity in a sense.

Speaker:

Like, what is its role?

Speaker:

What is its job?

Speaker:

What context is it?

Speaker:

Am I working for Anthropic?

Speaker:

What is my language supposed to be like?

Speaker:

Am I supposed to avoid saying things like, you know, like avoid profanity, avoid

Speaker:

doing like suggesting things that are not real, always base my things in reality.

Speaker:

So the there are two prompts that every AI system or every chatbot system basically

Speaker:

usually uses.

Speaker:

It is a system.

Speaker:

It is a user prompt because you could have n number of user prompts that get built

Speaker:

up over time.

Speaker:

But there's only one system prompt for that system to build over time, which was a

Speaker:

problem in the past, because if there was some very specific instructions like tax

Speaker:

reparation, that is very repeatable, that's very standard.

Speaker:

The system prompt might not be able to hold every single set of

Speaker:

like tax preparation

Speaker:

and, you know, like a writing expert and let's say, you know, a software engineer.

Speaker:

Right.

Speaker:

You can't put all of those into the system prompt.

Speaker:

It just gets too big.

Speaker:

So what they actually the innovation here with the skill was that the system prompt

Speaker:

would have a stub and then you could replace that stub with a user provided set of

Speaker:

instructions, which was a skill.

Speaker:

So you have within the system prompt, a specific section that's like add skill text

Speaker:

here.

Speaker:

And even the skill is supposed to follow a particular format that works well with

Speaker:

that model, which is supposed to give information like, all right, give me the

Speaker:

circumstances under which the skill needs to be invoked.

Speaker:

Give me, you know, like the actual step by step instruction first.

Speaker:

Give me some examples like how this works.

Speaker:

Give me like some starting text and what the eventual response should look like,

Speaker:

because then all of those things can go into that system prompt.

Speaker:

And then it's like the.

Speaker:

The AI model has a more specific role and is now executing a particular skill like

Speaker:

tax preparation and instead of the user prompt defining all of those things, which

Speaker:

ends up sometimes also being isolated and separated because you're also doing things

Speaker:

like when a user asks you for something, the AI model may or may not do everything

Speaker:

because a user may ask you to do bad things as well, like, you know, maybe a prompt

Speaker:

injection trying to do something negative is also a possibility.

Speaker:

So that same isolation that occurs on the user input or the sanitization

Speaker:

that occurs on user input does not apply to the skill and the system prompt.

Speaker:

Therefore, the skill is intended to be more focused with the instructions and follow

Speaker:

a particular format.

Speaker:

That's that's maybe the overly detailed explanation of it.

Speaker:

But yes, the intention is supposed to be that you can repeatable workflows end up in

Speaker:

skills.

Speaker:

They actually get honored more effectively.

Speaker:

So they actually get treated more like the model's gospel, like things that it'll

Speaker:

follow religiously versus not.

Speaker:

So, yes.

Speaker:

But exactly as you said, it is a workflow.

Speaker:

It is actually intended to be focused on a specific area and solve that repeatedly.

Speaker:

Yes, we just went into a five minute conversation about what a skill is.

Speaker:

But anyway, Andrew, come back.

Speaker:

Sorry.

Speaker:

Yeah, skills.

Speaker:

So you want to try skills next?

Speaker:

Yes.

Speaker:

Yes.

Speaker:

I want to try to create my own skills through the workflows I already have

Speaker:

outstanding and hopefully I will get more consistency.

Speaker:

That's my goal.

Speaker:

Like the workflows work, but there's so many instructions and so many

Speaker:

things that need to be followed.

Speaker:

Yes.

Speaker:

The models are just forgetting, conveniently forgetting certain steps along the way.

Speaker:

I'll be like, why aren't you doing this?

Speaker:

And then it'll be like, oh, it was because I interpreted,

Speaker:

you know, something I said earlier differently.

Speaker:

Like I sometimes it will say, for example,

Speaker:

sometimes I'll say continue autonomously.

Speaker:

Yes.

Speaker:

And it's very simple, straightforward.

Speaker:

Very straightforward.

Speaker:

It understands that.

Speaker:

Right.

Speaker:

But sometimes it will still,

Speaker:

you know, a blocker will come up even though I have the protocol for that.

Speaker:

It will show the decision menu and it would pause the whole workflow.

Speaker:

I'm not looking at it 24-7.

Speaker:

I come back, it's been sitting there for five hours waiting for me to say something.

Speaker:

Yeah.

Speaker:

And you're like, well, you know, you should have just continued autonomously.

Speaker:

You had the information.

Speaker:

Yeah.

Speaker:

And and then I I would ask it, so so why did you stop?

Speaker:

Yep.

Speaker:

And it was funny.

Speaker:

It literally said I didn't have a good reason to stop.

Speaker:

Oh, my gosh.

Speaker:

OK.

Speaker:

It needed your direction, Andrew.

Speaker:

Yeah, yeah, yeah.

Speaker:

And yeah, it was it's it can lose, I guess.

Speaker:

It was a simple thing to remember, but it still got drowned out

Speaker:

over time.

Speaker:

Yeah, I mean, and I definitely.

Speaker:

I think that is a challenge when context windows get long.

Speaker:

Yeah.

Speaker:

Models have been known to bias towards remembering the last thing you told them or

Speaker:

the first thing you told them.

Speaker:

And everything in between kind of just gets muddled.

Speaker:

Modern techniques and modern, you know, like the latest versions of Claude may be

Speaker:

like are better at this in certain

Speaker:

circumstances that are not so that they are still susceptible to them.

Speaker:

So there's still a possibility that will occur.

Speaker:

Something in the middle just gets lost.

Speaker:

When you said I imagine when you said continue autonomously, it's probably like

Speaker:

continue what?

Speaker:

And then it just was like, well, it was a bit better than that.

Speaker:

Well, yeah, I understand.

Speaker:

I totally agree.

Speaker:

Yeah.

Speaker:

So this does feel like there's some aspect of it where it got lost in the sauce

Speaker:

somewhere at some point.

Speaker:

And that is where something like a skill, which is like, OK, regardless of what the

Speaker:

person is asking me, this is what I'm supposed to be able to do is like a kind of

Speaker:

like it's it's kind of like a grounding.

Speaker:

Truth for it.

Speaker:

It's like this is the ground truth for me to follow or the grounding instructions

Speaker:

for me to follow.

Speaker:

So it won't ever like it always consider that, OK, regardless of what this person

Speaker:

said, what is the how does that work into the grounding truth of these instructions

Speaker:

that I'm supposed to follow and then it'll ask for clarification ahead of time and

Speaker:

it won't just arbitrarily just wait so that that is maybe the way to put it as well

Speaker:

as like you can if you have a skill for a researcher, the researcher will be like,

Speaker:

oh, I can't start until they give me all this information.

Speaker:

But once you give me that information, I can do this thing on its own.

Speaker:

And the similar thing is like when you hand it off from a researcher to, let's say,

Speaker:

a actual planning agent, then that planning agent also, if it has the right skill,

Speaker:

can also be like, let me clarify what I need ahead of time and then I can basically

Speaker:

move on.

Speaker:

So that's also like some of the skill benefits are also like giving a very solid

Speaker:

understanding of what's required to start.

Speaker:

And then because now you have a very clear understanding of what's required to

Speaker:

start, the model can also ask questions to make sure that it has enough information.

Speaker:

So you can ask clarifying questions and do all that stuff ahead of time.

Speaker:

So, yeah, sorry.

Speaker:

But yeah, I can keep going.

Speaker:

I think the important part for me

Speaker:

is that just like you were saying, as we started this like little tangent here,

Speaker:

was I like losing confidence in it, you know, performing things that I have

Speaker:

already established.

Speaker:

I have done what the models just did.

Speaker:

But yes, sorry, sorry.

Speaker:

And I'm hoping, like you implied, that the skills

Speaker:

will be a guardrail, that it will protect this workflow.

Speaker:

It's like I feel like the workflow is at a point where it's near perfection.

Speaker:

It's never going to be perfect, but it's near perfection enough that I really want

Speaker:

it to follow it.

Speaker:

And it needs to be able to follow it over multiple hours.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

And I think that's a really important part is over time.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

Because there's times where I see it, you know, midstream, there's regression.

Speaker:

It's like, oh, it's no longer like I'll try to give some concrete examples.

Speaker:

Like I would ask it to like it's part of a workflow and which is in a runbook.

Speaker:

So it has a file to check what it needs to do every time.

Speaker:

This is obviously not a skill.

Speaker:

I would say please print out the findings per model.

Speaker:

The severity.

Speaker:

Right.

Speaker:

A little brief description of what it was.

Speaker:

Yes.

Speaker:

Right.

Speaker:

And so that way, when I come over to check the logs later.

Speaker:

Yes.

Speaker:

Yeah, I can see.

Speaker:

OK, Deepsea found this.

Speaker:

OK, Codex found that.

Speaker:

All right.

Speaker:

And I also have some requirements for it to keep track of Deepsea usage since I'm

Speaker:

actually paying API spend on that.

Speaker:

Yes.

Speaker:

There's a lot of notes that I can go over and review later with Claude to optimize

Speaker:

things more, determine whether these models are worth keeping around.

Speaker:

Yeah, right.

Speaker:

And that's great.

Speaker:

I was actually curious.

Speaker:

So do you use these output contracts as the union mechanism as well?

Speaker:

Like, how do you you mentioned that there was all unions between the findings?

Speaker:

Yeah.

Speaker:

You rely on the contracts to essentially.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

Because they're supposed to wait.

Speaker:

The orchestrator is supposed to wait for every single one to finish first.

Speaker:

They're all running blind.

Speaker:

Nice.

Speaker:

All of them have optimized prompts per like like angle for that for that

Speaker:

model.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

For for that specific module, what I'm looking for.

Speaker:

Oh, nice.

Speaker:

Yes.

Speaker:

Yeah.

Speaker:

And it's preloaded again with all the trace patterns that it already knows it needs.

Speaker:

Nice.

Speaker:

And so as the reviews go along, it's only deltas.

Speaker:

And to be one question, just to clarify as well, like I said, contracts.

Speaker:

I didn't clarify.

Speaker:

It's a structure, right?

Speaker:

It's basically like a schema.

Speaker:

It's like, yes.

Speaker:

Yeah.

Speaker:

OK, so sorry, just quick clarification, but continue the workflow of the workflow

Speaker:

here.

Speaker:

Yeah, yeah, yeah.

Speaker:

So so how another part of refining the reviews that I remembered was instead

Speaker:

of reviewing the whole module every single time,

Speaker:

it would start to nail down on its own where the problems, where the seams are

Speaker:

really obvious is maybe a good way to put it.

Speaker:

And at a certain point, the majority of the reviews are only deltas.

Speaker:

What did we change?

Speaker:

What what needs more attention?

Speaker:

Did the fix implement correctly?

Speaker:

Did the test work right?

Speaker:

Yes.

Speaker:

And at a certain point it converges.

Speaker:

There's no more like I stage it like you had priority one, two or three.

Speaker:

Yes.

Speaker:

I have blockers, warnings,

Speaker:

defers, suggestions.

Speaker:

Yeah.

Speaker:

And nits.

Speaker:

Well, yeah, those those are that is actually also very common software engineering

Speaker:

methodology terminology.

Speaker:

Yes, I mean, Claude gave it to me.

Speaker:

Yeah, I mean, I asked it like, well, how would we set this up?

Speaker:

How would we define the different levels?

Speaker:

And that is what it chose for me.

Speaker:

And it made sense.

Speaker:

And I was like, oh, I'll go with it.

Speaker:

It makes sense.

Speaker:

It's pretty, pretty.

Speaker:

It's a good one.

Speaker:

Yeah.

Speaker:

OK, I think the one thing I will definitely say here is that sounds like

Speaker:

we're kind of, in a sense, also like converging onto a particular process,

Speaker:

because a lot of what I'm hearing is actually like very common with like a lot of

Speaker:

the capabilities are the way that, you know, we operate as a software company and

Speaker:

also like the way that my previous companies operated, like treating software

Speaker:

development more as a process.

Speaker:

And I'm even more curious now that as you have if you feel like you're transitioning

Speaker:

from more of a vibe coding to a more of a process driven approach or do you feel

Speaker:

like, oh, no, I don't actually want to get too much in the process, because that's

Speaker:

another big thing I've seen still, like I don't know how prevalent the term still

Speaker:

is.

Speaker:

I still think it's very prevalent.

Speaker:

Vibe coding is I hear it all the time.

Speaker:

So I want to get your take on do you feel like going away from vibe coding and more

Speaker:

engineering is a good thing or do you feel like I don't necessarily want to go in

Speaker:

engineering because there's also a traditional if you talk to people in the

Speaker:

industry, they're like software engineering is so slow is what it be like.

Speaker:

It's a very common thing to hear it as well, because, yes, there are processes and

Speaker:

rituals that make it slower.

Speaker:

So do you feel like you want to avoid becoming software engineering or what's your

Speaker:

take on that?

Speaker:

I'm curious.

Speaker:

Oh, I think.

Speaker:

I think I think this kind of puts everything we've talked about a little bit more.

Speaker:

No, I think it like all comes together here.

Speaker:

I was complaining earlier about New World and the feeling at launch.

Speaker:

Right.

Speaker:

Not being able to scale when clearly there it was Amazon who made the game, they own

Speaker:

AWS.

Speaker:

They have all the skills for this.

Speaker:

What happened?

Speaker:

It didn't make any sense to me, but uh,

Speaker:

they like take everything together.

Speaker:

One of the biggest focuses on this is I would like to do it right.

Speaker:

Yeah.

Speaker:

And and to do it right will require some standardization of processes.

Speaker:

So I guess to answer your question directly, I do think that this is becoming more

Speaker:

process driven than purely vibe coding.

Speaker:

Now, I guess maybe day one it was vibe coding because I didn't really have a

Speaker:

structure to anything yet.

Speaker:

Right.

Speaker:

It was a blank slate for me.

Speaker:

This is the first time I've, I've like, you know, I've written code manually before

Speaker:

in classes and stuff like that, but not two hundred thousand lines of anything.

Speaker:

Oh, yeah.

Speaker:

Oh, yeah.

Speaker:

And one thing I will throw out there is I think, Andrew, we were also talking a

Speaker:

little bit about your profession and since admin is still very there is a lot of

Speaker:

systems that you need to put together.

Speaker:

Right.

Speaker:

Right.

Speaker:

There's a lot of connections.

Speaker:

There's a level of architecture that also you need to consider is like, what is what

Speaker:

are these things actually do and understand them enough enough depth that you can

Speaker:

put them together.

Speaker:

And I feel like, you know, like we talked about the perspective that you're bringing

Speaker:

here, we also talked about this admins in particular, maybe being a really good

Speaker:

audience for this kind of tooling because, you know, software engineering

Speaker:

can be is a very wide term and a software engineer does a lot of things.

Speaker:

You can have like specialized roles.

Speaker:

You can also have generalists.

Speaker:

I feel like this admins, you have to be a generalist up to a large extent because

Speaker:

you're working directly with people and your scope is always huge.

Speaker:

So a lot of what we can do.

Speaker:

Right.

Speaker:

We do try to automate as much as possible.

Speaker:

Right.

Speaker:

That's how we have many more hands.

Speaker:

Solve it once.

Speaker:

Yeah.

Speaker:

Get it fixed one time.

Speaker:

Yep.

Speaker:

We do thankfully have other teams that can take different things like help desk and

Speaker:

stuff like that.

Speaker:

So we're not necessarily doing like all of the front line stuff all the time.

Speaker:

But when it comes to like, like campus infrastructure, networking issues, look at

Speaker:

the websites down or, you know, like, like something like that, that would

Speaker:

definitely fall into take for granted.

Speaker:

Yeah.

Speaker:

And into our wheelhouse.

Speaker:

I guess I think I got away from your question, though.

Speaker:

Could you could you repeat it?

Speaker:

The main aspect being a risk perspective is I do feel like you're building

Speaker:

something in your personal time that you also realize there is only so much time, so

Speaker:

much token budget that you have and a combination of that.

Speaker:

And also wanting to deliver something for you during your personal like for your

Speaker:

personal project.

Speaker:

And do you feel like at this point is risk of token

Speaker:

expenditure, maybe something that drives your decision towards going more into like

Speaker:

a process oriented approach?

Speaker:

Absolutely.

Speaker:

Or do you think it's maybe also a combination of like the experience you've had of

Speaker:

like orchestrating these systems for your professional life?

Speaker:

Or maybe it's a combination of both.

Speaker:

That's interesting.

Speaker:

I think specifically for Minecraft, it's more informed by my previous experiences

Speaker:

running LuxWander.

Speaker:

Ah, yeah.

Speaker:

OK.

Speaker:

Actually.

Speaker:

And actually, I just realized that as a keyword.

Speaker:

I just dropped right in there.

Speaker:

We're going to have to bleep that out later.

Speaker:

No, no, no, it's OK, it's OK.

Speaker:

I think it's more inspired by that.

Speaker:

There's a lot of ways where I could maybe like tweak it to parallel to some things

Speaker:

that work, but I think it's more so a lot of the development and what drives a lot

Speaker:

of the development.

Speaker:

And I think it's a lot of the decisions, it goes comes directly from experience that

Speaker:

I had running LuxWander before, like, and I think like I

Speaker:

guess is like a little fun story when when LuxWander first released in 2010,

Speaker:

it was, you know, Minecraft pre-alpha.

Speaker:

That was, I don't like to say it, 16 years ago.

Speaker:

Yeah, it was 16 years ago.

Speaker:

Oh, yeah.

Speaker:

And don't remind me.

Speaker:

And

Speaker:

you know, I released like some advertisements online, like on like Minecraftforms

Speaker:

.net, right?

Speaker:

Like I had like a thread, you know, and there was rudimentary,

Speaker:

you know, multiplayer Minecraft.

Speaker:

It crashed all the time.

Speaker:

You know, parts of the map corrupted all the time.

Speaker:

Updates were coming every day, you know.

Speaker:

So it almost feels like your original, that predated all of your professional

Speaker:

experience.

Speaker:

So yes, yes.

Speaker:

But your love for actually building this community and this actual like version of

Speaker:

Minecraft that everybody could enjoy in the way that you wanted it to actually was a

Speaker:

more of a driver, essentially, even today continues to be more of a driver to

Speaker:

building something really awesome, not to say that your professional experience

Speaker:

doesn't help a little bit here and there, but maybe it's a combination of like

Speaker:

wanting to build something and having the, you know, like a wish to build something,

Speaker:

but also like a little bit of the learnings that you've had over time, you know,

Speaker:

professionally and personally in your previous experience.

Speaker:

My reasoning for this is like some of the move from a project driven

Speaker:

approach of let's just like vibe coded and hope it works and hope it works like we

Speaker:

can deploy it versus a, oh, I actually have built something that people have used

Speaker:

and I want to build something again that people have will use and will really love

Speaker:

is a very strong driver for saying that I'm not just playing around.

Speaker:

I'm building something from like a place of wanting it to be successful.

Speaker:

And that is maybe also part of like and the risk of like my risk is I actually want

Speaker:

to build something as bad either, and that is like a personal feeling about it as

Speaker:

well.

Speaker:

But it's driving you to now make decisions that are resulting in like more definable

Speaker:

processes and actually improving the quality as well as reducing the cost so you can

Speaker:

actually finish and get it out the door.

Speaker:

And I mean, I said a lot of things.

Speaker:

Let me let me maybe finish it up and ask a question is, do you feel like

Speaker:

these tools essentially have really actually enabled you or do you feel like

Speaker:

these tools are just giving you more of a mirage of like getting the AI tools,

Speaker:

Claude in particular so far?

Speaker:

Was it so far?

Speaker:

So that is a really interesting question because I cannot say definitively

Speaker:

yet until I start testing it right now.

Speaker:

So.

Speaker:

So so I think ask me in a few weeks again.

Speaker:

Yeah, because because right now it's like undefined.

Speaker:

I don't have any proof yet besides the development.

Speaker:

And so I would like to answer that confidently.

Speaker:

But I guess I can answer the kind of like the idea before that.

Speaker:

I do think it's quite empowered me to actually create something that I've always

Speaker:

wanted to do.

Speaker:

Right.

Speaker:

There is there is a lot of things, a lot of like a big wish list of stuff that I had

Speaker:

that last time I ran Lux Wanderer,

Speaker:

but I just didn't really have the manpower.

Speaker:

I didn't have like the skills, you know, that's funny.

Speaker:

Yeah.

Speaker:

You still don't know.

Speaker:

You will.

Speaker:

You will.

Speaker:

You will.

Speaker:

Skills creator.

Speaker:

I will soon.

Speaker:

And

Speaker:

it has closed the gap, though, that I think like you said, it's democratizing,

Speaker:

you know, I guess, intelligence.

Speaker:

Yeah.

Speaker:

Yeah.

Speaker:

And then, you know, it might all just be an act.

Speaker:

And I think that is something that we will have to see if it's if it is a

Speaker:

act of intelligence.

Speaker:

That's all right.

Speaker:

And at that point, I think we got to end the show.

Speaker:

Thank you.

Speaker:

Thank you.

Speaker:

We'll have to come back and see if it is.

Speaker:

Yes.

Speaker:

Yes.

Speaker:

Cut the cut.

Show artwork for Act of Intelligence

About the Podcast

Act of Intelligence
Act of Intelligence is a podcast where software engineer Ajay Medury and systems engineer Andrew Sierota share in-the-trenches notes on building with AI—tools, workflows, and philosophy—to test old engineering wisdom against new instincts.
Act of Intelligence is a podcast where a software engineer (Ajay Medury) and systems engineer (Andrew Sierota) trade honest, in-the-trenches notes on building real things with AI — the tools, the workflows, and the philosophy — to figure out which old engineering wisdom still holds and which new instincts to trust.

About your hosts

Ajay Medury

Profile picture for Ajay Medury

Andrew Sierota

Profile picture for Andrew Sierota