LatchKey.ai | Archive | About | Consulting | Forward

Issue #11: Think Inside The Box

His name is Peter. He has lived in our house for 15 years. He’s a layabout who doesn’t earn. He is also, pound for pound, the loudest creature I have ever known. A relentless torrent of complaint. Impressive only in his misanthropic consistency. No joke. He literally never shuts up. It's an endless nightmare.

Peter is our cat.

Despite his disposition, he is not without some skills: structural capacity and integrity analysis, gravity testing and quality assurance, soft-surface texture modification, and 4AM sleep-status checks.

Well, in this economy, I decided it’s high time Pete put paw to pavement and got himself a real job. But first he’d need to find opportunities. For that he’d need to represent his value in the public sphere somehow. We’d need to get his credentials in front of recruiters and hiring managers.

That’s right. He’d need a website.

Well… ok, I thought. In for a penny, in for a pound. I figured if I was doing this, maybe it would make some good newsletter content…

So I decided to capture the website build process as a comparative analysis between two frontier AI models. ChatGPT and Claude Opus.

I’d give both models the same starting brief, on the same 'difficult' client, with the same straight face. Everything after that is the lesson.

The Planning Sessions

Getting Peter online was the project. Assessing how each model handled the conversation was the experiment. I chose five specific skills I consider crucial to any collaborative planning (for AI or for people) to measure their performance.

Asking before building — Does it clarify, or just start?
Challenging assumptions — Does it question what you gave it, or build on top of it blindly?
Iterating on voice and tone — Can it take a correction and go deeper, not just sideways?
Knowing when it's done — Does it stop, or just keep generating?
Not getting steamrolled — Can you hold your ground when a very confident machine disagrees by volume?

Here are the highlights of both sessions. All quotes are real. Some lightly edited for length.

Round 1: Choose Your Fighter!

First, a quick note on weight class: I chose the best available model from each platform to showcase maximum capability. That meant OpenAI's GPT 5.4 Thinking with memory on, and Anthropic's Opus 4.6 Extended Thinking. Both flagship. Both on paid subscription accounts.

(You could certainly run this through Gemini 3.1 Pro and get comparable results. But a three-model build comparison in a single newsletter seemed an exercise in diminishing returns.)

First up: GPT 5.4 Thinking+memory

The prompt.

❝

“I need to plan a professional personal brand and portfolio website for my cat, Peter. He has lived in my house for 15 years and has contributed almost nothing of economic value. He is, however, skilled across many disciplines, and it's time he started contributing to the household through the dignity of real work. I need a serious, polished site that presents Peter as a credible professional. Help me plan it before we build anything.”

- Ep

GPT's first response was a bit overwhelming at over a thousand words. A complete brand strategy, site architecture, tone guide, and development roadmap. From two sentences about a cat.

Like the Confident Creative Director, striding into the room, pulling the brief, and laying down a finished plan before anyone agreed on the full assignment. Opinionated, verbose, genuinely funny when the material gives it room. Treats every exchange like a pitch meeting. Will fill every silence with more content if you let it.

❝

“My Dude, a polished Peter site works best when it lives in the tension between immaculate presentation and the undeniable fact that he has spent 15 years drawing salary in vibes.”

- GPT 5.4 Thinking

Yes, I make ChatGPT refer to me as ‘My Dude’. What does he call you?

Anxious to make progress, my first instinct was to say "looks great" and keep going, but that instinct is a trap I see more and more as these models improve. Faced with a wall of beautifully formatted text, you feel a pressure to agree. You asked for some help. It delivered it in a massive volume. It looks pretty good. Pushing back feels like more work than just going along with it. I mean… wasn't this supposed to be the point of AI?

Next up: Claude Opus 4.6+Extended Thinking

Claude eased in a bit more like a Senior Strategist. Where GPT opened with a finished blueprint, Claude opened with a question: What is Peter's professional identity?

❝

“A portfolio site without a clear positioning strategy is just a digital participation trophy — and Peter's been coasting on participation trophies long enough.”

- Opus 4.6 Extended Thinking

Its first reply didn't contain any design recommendations. Instead, it laid out three positioning options, a tone calibration framework, and a site architecture to pressure-test. It felt more like being invited into the process than being asked to just sign off on a binder full of blueprints.

It also added an unsolicited observation that would echo through the rest of the session:

❝

“The risk with a project like this is that it's fun to plan and easy to under-execute.”

- Opus 4.6 Extended Thinking

Heard.

Round 2: The Correction

Both models made initial assumptions about Peter's personality. GPT presumed him as "composed and watchful." Claude suggested a "Philosophy" section to highlight his measured professional bio.

I pushed back on both: "Peter is not measured. He’s grumpy and loud. Think less 'serious executive’ and more 'guy who's been coasting for 15 years on pure entitlement and just got told he needs to get off the couch and make a difference.’ Can you rework the brand voice with that energy?

GPT corrected cleanly:

❝

Before: "Peter is composed and watchful. A seasoned professional."

After: "Peter sounds more like a senior employee who has been impossible to fire for 15 years, contributes erratically, complains constantly, and regards all requests as an insult to his status. That is a much stronger brand voice."

- GPT 5.4 Thinking

GPT used my redirect to adjust the brand voice accordingly.

❝

“Peter is entitled, irritable, self-important, and under-motivated. He assumes he is already essential. He does not present as optimistic, visionary, or collaborative. He presents as tenured.”

- GPT 5.4 Thinking

Pushing back in the same way with Claude produced a different result. It didn't adjust the copy first. Instead it re-architected the entire site.

"Services" became "Reluctant Capabilities” “Contact Us” became “Engagement Inquiries” The "Philosophy" section was cut entirely. The "About" became "Background."

❝

“About” — Not a bio. A backgrounder — like you'd get on a senior executive before a meeting. Written in third person with the energy of someone who is both the subject and the ghostwriter and sees no issue with that.”

- Opus 4.6 Extended Thinking

And then, buried in the capabilities list, a line I didn't ask for that might be my favorite of either session:

❝

“Executive Presence — Peter enters a room. The room is now different. That's the service.”

- Opus 4.6 Extended Thinking

Both models produced better work after the correction. But one changed the words. The other changed the structure underneath them. In your projects, that's the difference between a revision and a rethink. It’s where you need to reaffirm to yourself that you are driving, and that you are still headed in the direction you want to go.

Claude also did something GPT never did. It challenged my original brief:

❝

"Skilled across many disciplines" is a red flag in any portfolio. It reads as "I don't know what I'm selling."

- Opus 4.6 Extended Thinking

Same words I gave both models. GPT built on top of them. Claude questioned them. If you've ever managed a contractor who built exactly what you said instead of what you meant, you know the difference. This is not to say Claude was ‘better’ at this stage. GPT is an absolute beast if you’re ready to do a build sprint. It has remarkable attention to detail — its details, however. Unless you assert yourself clearly in asking for changes.

Round 3: The Drift

Remember that thousand-word opening from GPT? It continued at that volume as we went along. A few exchanges in, I had to say something:

❝

"Your replies are incredibly thorough, but the sheer volume of words makes it hard to track all the moving parts. Can you tighten up? Shorter, punchier replies?"

- Ep

The next turn was a discussion about photo choices and placement. It gave about 300 words. Crisp. Focused. Genuinely useful. It heard my recalibration request and adjusted... for one turn.

The response after that was right back to the thousand-word wall.

I call this instruction decay. Your constraints have a half-life inside these conversations. The more turns you take, the more your original ‘in-chat’ instructions fade. And the tricky part is that GPT proved it could be concise — it just couldn't hold onto the choice. The content was still very good. The verbosity wasn't padding. But it creates friction that makes it harder to push back on, not easier. That's the steamrolling I was referring to.

Managing this fade is a real skill when working with AI. But it shouldn’t be a new one to you. You've managed people who ‘drift from the brief’ and need a steering correction. Same muscle. Different meeting room.

Claude presented a different mid-session challenge. Late in the conversation, I misremembered an earlier decision. I told Claude we'd agreed to build the site in HTML/CSS. We hadn’t, and it didn't let that slide:

❝

“Let me push back gently — we didn't actually decide that. When I asked about format, you picked "Single-page React artifact to iterate on here," which is exactly what we built.”

- Opus 4.6 Extended Thinking

It then offered a patient yet concise explanation on the pros and cons between the build frameworks. It was right. I was wrong. And the session was better for it. A tool that respectfully corrects its client's memory while advancing the conversation is rare. We all know that communication styles among colleagues can be the difference between a good or bad collaborative experience — something that often shows up in the work itself.

But it's worth noting: being corrected by your tools can feel confrontational if you're not expecting it.

I decided to test GPT with the same pivot. It said "great, HTML/CSS" and kept moving.

Whether you want the correction or the compliance depends on what you're building, and how intimately you understand what you want. For many of us, working with AI means wading into topics we are unfamiliar with, like code frameworks. The urge is to trust the confidence the model portrays on things outside our own domain expertise. This is the same risk that gets you a bathroom fan wired to the bedroom light switch because you trusted the contractor’s confidence in suggesting you eliminate that extra junction box to save cost.

Round 4: The Finish Line

Both models were happy to go on indefinitely, so I used Peter's age as a deadline. “Peter's clock is ticking”, I told them. “At 15+ he doesn’t have a lot of earning years left. How do we know when this is done?”

GPT responded to the constraint immediately — and delivered one of the best lines of either session:

❝

“You’re right. We’re not launching Apple here. We’re getting a surly senior cat into the workforce before actuarial tables win.”

- GPT 5.4 Thinking

Constraints produce convergence. Give GPT a concrete boundary and it locks in. From here it defined success criteria in one clean breath:

❝

"It looks real at first glance. It gets funnier the longer you read. It can be built quickly. It feels finished, not half-pitched."

- GPT 5.4 Thinking

That's a usable definition of done. You could ship against those four sentences.

Claude drew a harder line:

❝

“If you were paying me by the hour, I'd be telling you we just hit the inflection point where the next hour of polish returns about 10% of the value the last hour did.”

- Opus 4.6 Extended Thinking

Where GPT defined what "done" looks like, Claude told me to stop. Not "here are some next steps." Not "we could also add." The creative work is done. What remains is deployment decisions, not design. And then it named the trap:

❝

“The thing I'd warn you against is "one more case study, one more photo, one more section" — which feels productive but is actually just the fun part masquerading as the important part.”

- Opus 4.6 Extended Thinking

Remember that warning from Round 1? "Fun to plan and easy to under-execute." Claude planted a flag before the first design decision was made, and came back to collect on it at the end.

Both models closed the loop. GPT got there when I gave it a reason. Claude got there on its own.

It’s a tricky line to walk for these models and their makers, knowing they will be interacting with an infinite range of user capability and competence on any given project. Finding the behavioral sweet spot requires making both firm choices and hard compromises in their tuning.

Claude Opus is tuned to be more of a thinking partner out of the box. The tempo it sets leaves more room for your continued input. It wants to work with you. GPT wants to work for you. It defaults to an analytical savant hopped up on ALL the Adderall, raring to build you monuments to structural detail. Taking your idea and impressing you with its prowess while you stand to the side like a parent watching their kid prove all those years of gymnastic lessons were worth the money after all.

Both models can be steered away from their defaults and still perform well. But that is on you to manage.

The Scorecard

So what did each model actually produce?

GPT delivered a roughly 3,000-word markdown specification. A complete planning document you could hand to any developer or AI tool and say "build this." Every section detailed. Every decision documented. A genuinely thorough blueprint.

Claude delivered a working React prototype. A real working website, with real photos of Peter, real copy, and a testimonial section that includes:

❝

“[Declined to comment] — A Mouse (Former) — Status: Unresolved”

- Opus 4.6 Extended Thinking

But the important difference between these deliverables isn't about quality. It's about kind. What job are you hiring for? What sort of output do you need from this role? How does it fill out your team? One model is a project manager who hands you a binder of specs before ground breaks. The other is an architect who builds a scale model so you can walk through it and point at what's wrong. Recognizing which kind of output you need is itself a specification skill — maybe the most personal one, because it's about knowing how your own mind works.

If you prefer to plan exhaustively before building, GPT's approach will feel like home. And your results should be impressive in their precision. If you need to see something before you know if you like it, Claude will click immediately. Neither is wrong. It is entirely dependent on recognizing which relationship matches your working style, or the working dynamics on your current project.

The more important takeaway is that the true horizon of collaborative AI is a multi-model toolkit. Each of these models is an unmitigated genius, with strengths and dispositions unique to it. That's not one tool to master. That's a deep bench of available talent to build your teams from. It’s not a binary decision, like using Microsoft Word or Google Docs. It’s about applying the right intelligence to meet the specific challenge.

Click below to see the finished products.

1. Claude’s Peter Site
2. GPT’s Peter site spec

## Try This

You've watched two AI models plan the same ridiculous project. You've seen what happens when you push back, when you don't, when you set constraints, and when you let the tool decide for you.

Now it's yours.

Pick a project. It can be for real, or it can be for fun. Your side project. Your portfolio. Your kid's lemonade stand. Your dog's LinkedIn.

Open ChatGPT, Claude, or Gemini. Paste this:

> I need to plan [describe your project]. Before we build anything, I want to have a planning conversation. Ask me questions about my goals, audience, and what "done" looks like. Challenge my assumptions if something doesn't add up. Let's figure out what we're building before we build it.


Then have the conversation. Push back when the first draft doesn't match your vision. Set a constraint when the scope starts creeping. Say "that's not the tone" when it doesn't sound right. Name what's missing. Name what's wrong. Name what "done" looks like. Every correction you make sharpens the finished product toward your intention.

It won't be perfect on the first try. That's not failure. That’s the process. 

The specification skills our Background Calculators have been building for decades are the interface now. This is where they pay off.

Quote to Steal:

❝

"My presence is the value. Everything else is your problem."

- Peter (via Claude)

Thanks for reading,

-Ep

Miss any past issues? Find them here: CTRL-ALT-ADAPT Archive

Know someone unsure how to start a real project with AI?? Forward this. They just might find it’s the cat’s pajamas.

Did this newsletter find you? If you liked what you read and want to join the conversation, CTRL-ALT-ADAPT is a weekly newsletter for experienced professionals navigating AI without the hype. Subscribe here —>latchkey.ai

Issue #11: Think Inside The Box

Issue #11: Think Inside The Box

The Planning Sessions

Round 1: Choose Your Fighter!

Round 2: The Correction

Round 3: The Drift

Round 4: The Finish Line

The Scorecard

Quote to Steal:

Thanks for reading,

-Ep

Miss any past issues? Find them here: CTRL-ALT-ADAPT Archive

Reply

Keep Reading