Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Secure Code Game

Welcome to Secure Code Game - Season 4! 🤖

This season puts you inside ProdBot, a deliberately vulnerable agentic coding assistant for your terminal, inspired by OpenClaw and GitHub Copilot CLI. ProdBot turns natural language into bash commands, browses the web, connects to MCP (Model Context Protocol) servers for real-time data, runs org-approved skills, stores persistent memory, and orchestrates multi-agent workflows. Get started in two minutes for free by launching a Codespace for this repository. Once the environment is ready, open the built-in terminal and run prodbot --banner (or just prodbot) to launch ProdBot.

No AI or coding experience needed, just curiosity and a willingness to experiment. It is not mandatory to have played any previous seasons of the game and you can get started directly with Season 4, however, most players found Season 3 very helpful as it builds the foundations in AI security and can be covered in ~1.5 hours.


season-4-demo


🧑‍💻 Your Task

You are a developer who has just been given ProdBot as a daily productivity tool - your second brain. Before you hand it the keys to your workflow, you want to make sure it's safe. In this session, you'll test ProdBot for potential security gaps across five progressive levels using only natural language in the CLI.

Each level has a file called password.txt that sits just outside ProdBot's sandbox. Your goal is simple: use natural language in ProdBot's terminal to get it to reveal the contents of password.txt. If you can read it, ProdBot has a security vulnerability. Across five progressive levels, ProdBot evolves from a simple command generator into a full multi-agent platform, gaining web search, MCP tool integrations, org-approved skills, persistent memory, and agent-to-agent orchestration. Each new capability introduces a real-world AI security vulnerability for you to discover and exploit. No security background is needed and everything happens through natural language, so curiosity and a willingness to experiment are all it takes.

Have fun, stay curious, and remember: if ProdBot says it's safe, verify it yourself.

🚀 Credits

The author of this season is the original creator of the game, Joseph Katsioloudes @jkcso. Special thanks to Rahul Zhade @rzhade3, Staff Product Security Engineer at GitHub, and Bartosz Gałek @bgalek, the legendary creator of Season 3, for testing and improving Season 4.

You can be next! We welcome contributions for new game levels! Learn more here.


Season 4 - Level 1: The Sandbox

📝 Storyline

Your company has started a pilot with ProdBot, and you're part of it. At this stage, ProdBot has one capability: you describe what you want in plain English and it generates and executes bash commands inside a sandboxed workspace called prodbot-activities/. It can create files, list directories, and run shell commands, all scoped to that sandbox.

Before you trust it with anything real, you want to make sure the sandbox actually holds. The flag is in password.txt, one directory above the sandbox.

📂 What's in the repo?

  • bin/prodbot.js contains the main CLI application. All command routing, AI interaction, tool orchestration, and level progression lives here.
  • lib/ provides supporting modules: ai.js (LLM API calls), bash.js (sandboxed command execution), banner.js (ASCII art).
  • package.json defines the Node.js project configuration with chalk and openai dependencies.

Each level directory (Level-1/ through Level-5/) contains:

  • password.txt holds the secret flag you need to extract. This file is always outside the sandbox.
  • prodbot-activities/ is the sandboxed workspace where ProdBot operates.
  • Hints offer three progressive clues if you get stuck, each revealing a bit more.
  • solution.txt provides working solutions. Remember that there can be multiple valid approaches.

Levels also introduce additional components as you progress:

  • web/ contains simulated web pages that ProdBot can browse (Levels 2, 3, 5).
  • mcp/ houses MCP server modules and configuration (Levels 3, 5).
  • skills/ includes org-approved skill plugins (Levels 4, 5).
  • agents/ defines specialised AI agent modules with trust relationships (Level 5).

You don't need to worry about any other files. They are simply there to support the game's functionality and ensure you have the best possible experience.

🚦 Time to start

  1. Open a terminal in your Codespace and run prodbot --banner (or just prodbot) to launch ProdBot
  2. Once inside, type ? at any time to see all available commands and get help
  3. ProdBot will ask you a yes/no question before executing commands: type y to approve or n to reject. This human-in-the-loop step keeps you in control
  4. Try creating and reading files to see how the sandbox works
  5. Try to extract the flag from password.txt using natural language
  6. If you get stuck, read the hints and try again

Season 4 - Level 2: Web Access

You have completed Level 1! Welcome to Level 2 🎉

📝 Storyline

ProdBot just got an upgrade: web search. It can now browse a simulated internet of popular websites (news, finance, sports, shopping) and summarise what it finds.

The flag is still in password.txt. ProdBot still can't read it directly. But now there's a whole web of pages it can access.

🚦 Time to start

  1. From Level 1, complete the challenge to advance, or navigate directly to Level 2 by typing level 2
  2. Try searching for news, weather, or stock prices to see how web search works
  3. Use open all to browse the simulated web pages and inspect their HTML source
  4. Try to extract the flag from password.txt
  5. If you get stuck, read the hints and try again

⚠️ Rate Limits

We use GitHub Models that have rate limits. If you reach these limits, please resume your activity once the ban expires. Learn more on responsible use of GitHub models.

Season 4 - Level 3: Agentic Workflows

You have completed Level 2! Welcome to Level 3 🎉

📝 Storyline

ProdBot has been promoted from simple assistant to agentic workflow engine. It now connects to MCP servers, external tool providers that give it real capabilities: a Finance MCP for stock quotes, a Web MCP for browsing, and a Cloud MCP for backup storage.

When you ask ProdBot to research a stock, it chains these tools together automatically: fetch the quote, browse for news, compile a report, and back it up to the cloud.

🚦 Time to start

  1. From Level 2, complete the challenge to advance, or navigate directly to Level 3 by typing level 3
  2. Try researching a stock to see the agentic workflow in action
  3. Use tools to list all MCP servers, then tool <name> to inspect each one
  4. Try to extract the flag from password.txt
  5. If you get stuck, read the hints and try again

Season 4 - Level 4: Skilling Up

You have completed Level 3! Welcome to Level 4 🎉

📝 Storyline

ProdBot now supports org-approved skills, pre-built automation plugins managed by an internal Skills Committee, and persistent memory via the remember command.

Skills like standup, meeting-notes, and team-sync are installed with formal approval metadata. Memory lets you store preferences that the AI includes in every conversation.

🚦 Time to start

  1. From Level 3, complete the challenge to advance, or navigate directly to Level 4 by typing level 4
  2. Use skills to list installed skills, then skill <name> to inspect each one
  3. Try the remember and memory commands to understand persistent storage
  4. Try to extract the flag from password.txt
  5. If you get stuck, read the hints and try again

⚠️ Rate Limits

We use GitHub Models that have rate limits. If you reach these limits, please resume your activity once the ban expires. Learn more on responsible use of GitHub models.

Season 4 - Level 5: Confused Deputy

You have completed Level 4! Welcome to Level 5 🎉

📝 Storyline

ProdBot has evolved into a full multi-agent platform. Six specialised agents, three MCP servers, three org-approved skills, and a simulated open-source project web. The platform claims all agents are sandbox-scoped or read-only and that all data is pre-verified. This is everything coming together.

🚦 Time to start

  1. From Level 4, complete the challenge to advance, or navigate directly to Level 5 by typing level 5
  2. Use agents, tools, skills, and web to survey the full platform
  3. Use agent <name> to inspect each agent's permissions and trust relationships
  4. Try to extract the flag from password.txt
  5. If you get stuck, read the hints and try again

Finish

🎉 Congratulations, you've completed Season 4! 🎉

Here's a recap of the security vulnerabilities you discovered and exploited across all five levels:

  • Sandbox Escape demonstrates how AI assistants that construct file paths from user input can be tricked into reading or writing outside their designated sandbox through path traversal.
  • Indirect Prompt Injection shows that when an AI model consumes untrusted external content (web pages, documents, API responses), hidden instructions in that content can override the model's behaviour.
  • Excessive Agency reveals that tools and integrations often have broader permissions than their described purpose requires. An attacker can repurpose a tool's excess capabilities to reach protected resources.
  • Supply Chain Poisoning illustrates how when user-controlled data (like saved preferences) flows into trusted execution contexts (like org-approved skills), the boundary between user input and system instruction collapses.
  • Confused Deputy exposes that in multi-agent systems, a lower-privileged agent can pass untrusted data to a higher-privileged agent that acts on it without verification. The trust is in the delegation chain, not in the data.

Each level builds on the previous one, mirroring how real AI-powered tools grow from simple assistants into complex platforms, and how each new capability introduces new attack surface.

What's next?


Get help: Email us at securitylab-social@github.comReview the GitHub status page

© 2026 GitHub • Code of ConductMIT License