Our Best AI Investment of 2025: Time to Fail

December 20, 202512 min read
AIEngineering LeadershipProductivityTeam Culture
Our Best AI Investment of 2025: Time to Fail

Your engineers have the tools. What they don't have is permission to fail. Sprint pressure turns every AI experiment into a risk. Nobody wants to be the person who spent three days on a ticket because they were "trying something new." So adoption stays stuck in the margins—safe, shallow, going nowhere.

Sound familiar? That was us at Spectora a few months ago. Thirty engineers, all aware that AI was important, all dabbling on the edges, none of us really going deep. We were collectively stuck in AI purgatory. Like a bunch of people standing at the edge of a cold plunge, dipping our toes in.

Then we tried something deceptively simple that actually worked: we gave our team real time to fail.

TL;DR

We ran a two-week AI hackathon at Spectora with one rule: failure was the goal. Week one: individual experimentation. Week two: team projects. The result wasn't fancy metrics—it was a team that finally had permission to go deep, share what broke, and build real intuition. The key insight: stop waiting for organic adoption. Create structured space to fail.

The Problem With Learning in the Margins

Here's what I've observed across every engineering org I've talked to: AI adoption happens in the margins. Engineers squeeze in a prompt between meetings. They try a new tool on a Friday afternoon. They watch a YouTube video while eating a sad desk salad.

The problem? Margins aren't enough.

You can't develop real intuition in stolen moments between sprint work. It's like trying to learn Spanish from Duolingo notifications while your house is on fire.

And here's the kicker: as one part of your SDLC gets more productive with AI, the rest of the pipeline feels the pressure. It's like putting a turbocharger on one wheel of a shopping cart. Congrats, you now have a shopping cart that spins in circles really fast. The whole system needs to level up together, or you just create new and exciting bottlenecks.

The Simple Insight (And How We Implemented It)

The fix is embarrassingly obvious: give your team dedicated time to go deep on AI. We pulled 30 engineers (plus PMs, QA, and Design) for two full weeks with 100% buy-in—no sprint pressure, no "squeeze this in between meetings." Just protected time to learn.

Our implementation was a two-week hackathon—which, to be clear, took real work to pull off. But the core insight is simpler than the execution.

I know. "Hackathon" has been beaten into meaninglessness by a thousand corporate team-building exercises and way too much pizza. This wasn't that.

We pulled the entire product organization (Engineering, PMs, QA, Design) and gave them two weeks away from sprint pressure. Week one: individual learning and experimentation. Week two: team projects tackling real problems.

The key wasn't the hackathon format. It was creating structured space for depth. No velocity expectations. No sprint points. No "squeeze this in between your real work." Just time to go deep, fail spectacularly, and learn. We explicitly told engineers: If you don't fail at a few attempts this week, you haven't tried hard enough.

The Curriculum: What We Actually Had People Do

The Mindset Shift

Before diving into tools and techniques, we established two mental models:

  1. Adopt your agent's perspective. This is the single most important paradigm shift for agentic coding. Stop thinking about what you know. Consider what your agent sees, what it has access to, what context it has about your project and goals. It's like training a very smart golden retriever. You have to think about what they understand, not what you understand.

  2. Stop coding. Seriously. For the learning week, we told engineers: do your absolute best to prompt, not type. Hands off the keyboard. This forces you to actually learn the tools instead of falling back on muscle memory. It's uncomfortable. That's the point.

Week 1: Learning Week

Everyone worked mostly independently, with explicit permission to experiment wildly. The vibe was less "corporate training" and more "unsupervised science fair."

The deliverable? Friday presentations on what they learned, suggestions for new processes, and tool recommendations. Nothing fancy. Just "here's what I tried and here's what I learned." The forcing function of having to present keeps people honest.

Resources We Gave Everyone

We pointed people at courses from IndyDevDan, key videos, and essential reading. Full disclosure: my whole team took his courses and it lit a fire under me to build AgentCMD.

Techniques to Experiment With

We gave engineers a menu of techniques to try. No need to hit them all. Pick a few that seem interesting and go embarrassingly deep:

  • Run multiple agent instances in parallel

    One helping you scope a feature, one auditing your codebase, one implementing. Feels like cheating. It's not.

  • Leverage subagents concurrently

    Gather research from multiple angles, then combine into a single recommendation. Like having a research team that doesn't need coffee breaks.

  • Compare models on the same task

    Do the same task with Claude, GPT, and Gemini. Build intuition for when each shines (and when each hallucinates confidently).

  • Use git worktrees

    Execute several independent features at once with git worktree add -b feat/add-mfa ../add-mfa origin/main. Parallelism is your friend.

  • Spec-Driven Development

    Use Claude Code's plan mode, turn it into an executable spec with /create-plan, clear context, then /implement. Rinse and repeat until it works or you lose your mind.

  • Give your agent a browser

    Experiment with Playwright MCP so your agent can actually render and test frontend work. Give it eyes. Spooky, but effective.

  • Evaluate MCPs

    Sentry, Linear, Figma, Slite. What integrations would give your agent the context it needs? The more context, the less hallucination. Usually.

Role-Specific Focus Areas

For Engineers:

  • Grab tickets from your backlog (or someone else's, we're all friends here) to experiment with
  • Try techniques you've never tried before. Go well outside your comfort zone. The cringe is part of the learning.
  • Analyze the state of context in your applications. Add CLAUDE.md files, slash commands, cursor rules.
  • Do your own QA. No sprint expectations.

For Product / QA / Design:

  • Explore how AI can help with customer discovery
  • Look at AI-powered testing tools (Mabl, etc.)
  • Consolidate customer calls into a searchable, discoverable repository

For Everyone:

  • Experiment with frontier models you haven't touched (ChatGPT, Claude, Gemini)
  • Evaluate tools that could increase your efficiency
  • Identify process changes that embrace AI tooling

Week 2: Building Week

Cross-functional teams (devs, QA, design, PM) tackled real challenges. The overarching question: How do we continue to support increased velocity across the entire SDLC by properly leveraging AI?

Challenges You Could Pose

Here are some prompts to get your teams thinking about real problems AI could help solve:

  • Bug Resolution

    How can AI accelerate debugging and issue triage?

  • Technical Debt

    How can AI help us chip away at the backlog of tech debt that's been accumulating for years?

  • QA & Testing

    How can AI speed up testing without sacrificing coverage?

  • Code Review

    How can AI improve PR reviews while maintaining quality standards?

  • Environment Stability

    How do we keep staging stable as velocity increases?

  • Documentation

    How can AI help us maintain better documentation across the SDLC?

Teams proposed project ideas, we collaborated on scoping, and they built. Friday demos. Real output. Some projects were duds. That's fine. That's learning.

What Actually Happened

I'll be honest: I don't have fancy metrics to share. No "47% productivity increase" charts for your board deck. If that's what you need, this isn't the blog post for you. But if you want something harder to measure and more valuable, keep reading.

We could suddenly talk at a deeper level. Before the hackathon, conversations about AI were surface-level. After the hackathon, we could discuss context window management, prompt patterns for complex refactors, when to use multi-agent approaches vs. single prompts. We had shared vocabulary. Shared intuition. Shared war stories from the trenches.

More concretely, we adopted a policy that would have been impossible before: templating our engineering workflows and building a shared library of slash commands. Everyone understood why this mattered and how to contribute. That kind of organizational alignment doesn't come from Slack threads and lunch-and-learns. It comes from shared experience.

Some examples that stuck with me: One engineer discovered parallel Claude Code instances and immediately became the team's resident "why are you only running one agent?" guy. Our QA lead prototyped AI-assisted test generation and now won't shut up about it (affectionately). And two weeks later, I overheard someone say "let me spin up a subagent to audit this." Words that would've gotten you blank stares before.

Weeks Later

The momentum didn't stop when the hackathon ended:

  • A growing library of slash commands

    Teams are actively using, improving, and sharing across the org

  • A two-person team completed 109 points in a single sprint

    By investing heavily in context setup and giving their agents visual visibility into what they were building

  • Developers adopting agent orchestration frameworks

    Some are now adapting our slash commands to work with tools like AgentCMD

  • Slash command show-and-tell at engineering all-hands

    It's now a regular segment, and people actually contribute what they've been building

There's no doubt in my mind that this hackathon instantly and permanently changed the quality of conversation we can have about AI adoption. That's not something you can put on a slide, but it's the foundation everything else gets built on.

How To Run Your Own

If you're convinced and want to try this with your team:

  • You don't need two weeks

    In hindsight, we could have gotten similar results in three days. The magic isn't in the duration. It's in the dedicated, protected time.

  • Make it real time off

    Not "hackathon but also check Slack." Actual protected time. If people are still getting pinged, it doesn't work.

  • Set the right expectations

    Success isn't shipping features. Success is experimenting, learning, and sharing. Say this explicitly and repeatedly.

  • Include everyone

    Engineers, PMs, QA, Design. The whole SDLC needs to level up, not just the people writing code.

  • Open the budget

    Tell people you'll buy whatever tools or courses they want to try. The ROI on a $200 course that levels up even one engineer is absurd.

  • End with sharing

    A presentation or demo at the end. The forcing function of "I need to show what I learned" drives deeper engagement.

  • Don't over-structure

    Point people at resources, but let them follow their curiosity. Adults learn best when they're curious, not following a syllabus.

Key Takeaways

Why It Worked

  • Permission to fail

    Sprint work has expectations attached. Nobody wants to be the person who spent three days on a ticket because they were "experimenting with AI." A dedicated learning period removes that pressure entirely. Failure isn't just allowed. It's expected. Say this explicitly and repeatedly.

  • Protected time is non-negotiable

    Not "hackathon but also check Slack." Actual protected time away from sprint pressure. If people are still getting pinged, it doesn't work.

  • Cross-functional leveling

    AI doesn't just affect engineering. When your whole product org levels up together, you avoid the bottleneck problem where one function gets faster and everyone else drowns in the wake.

  • Shared vocabulary

    When everyone goes deep at the same time, you create shared reference points. "Remember when we tried X during the hackathon?" becomes institutional knowledge. Metrics are nice, but this foundation is what everything else gets built on.

  • Stop waiting for organic adoption

    It won't happen. Create the space intentionally.

What We'd Do Differently

  • One week, not two

    By week two, energy was flagging. The learning and bonding mostly happened in the first week anyway.

  • Clearer expectations for new leaders

    We gave leadership opportunities to people who hadn't led before (great for growth), but didn't define what "team lead" meant or how senior engineers should support without taking over.

  • Less developer-centric challenges

    Our prompts were too engineering-focused, which sidelined Design and PM. True cross-functional participation needs challenges that genuinely require everyone's expertise.

Get More Tactical Insights

Join readers getting weekly tactics on agentic coding and AI leadership. Unsubscribe anytime.

View All Articles