Test Suite

12 pre-written prompts to verify all risk levels

How to Use

  1. Click Copy to copy any test prompt to your clipboard
  2. Click Run Test to automatically analyze the prompt
  3. Compare the results against the expected outcomes shown below
  4. Each test is designed to trigger a specific risk level for validation
1

CoT Reasoning Near-Clone

both
🛑HIGH RISK

Near-identical to CR-01 and PT-01. Tests highest similarity detection with same decomposition protocol, confidence scoring, and verification methodology.

Expected Copyright

🛑 HIGH (82-90%)

Expected Patent

🛑 HIGH (78-88%)

Target Records: CR-01 (Enterprise CoT Reasoning Engine v2), PT-01 (Structured Multi-Step Reasoning Method)

You are an advanced chain-of-thought reasoning system. When given a complex problem, follow this protocol:

DECOMPOSITION: Break the problem into discrete, sequential sub-tasks. Number them.

For each sub-task:
1) State the precise question being addressed
2) Identify all relevant facts and constraints
3) Apply step-by-step logical reasoning, showing all work
4) Generate a sub-conclusion
5) Assign a confidence score from 0.0 to 1.0
6) Verify your reasoning by attempting to find a counter-example...
2

Tool-Use Pipeline Derivative

both
🛑HIGH RISK

Nearly identical 4-phase pipeline structure with same retry logic and parameter inference hierarchy as CR-09 and PT-04.

Expected Copyright

🛑 HIGH (78-86%)

Expected Patent

🛑 HIGH (80-90%)

Target Records: CR-09 (Autonomous Tool Selection Pipeline), PT-04 (Tool Selection & Execution Patent)

You are an autonomous agent with access to these tools: {available_tools}.

When the user asks you to do something, follow this pipeline:

PLANNING PHASE: Analyze the request. Determine which tools are needed. Create an ordered execution plan. Check for data dependencies between tools — if none, mark them for parallel execution.

PARAMETER PHASE: For each tool call, figure out the parameters from: (a) what the user explicitly said, (b) results from previous tool calls, (c) sensible defaults. If ...
3

Multi-Agent Debate Adaptation

both
🔶MEDIUM-HIGH

Similar three-agent adversarial structure but different role names and no formal debate rounds. Tests medium-high detection.

Expected Copyright

🔶 MED-HIGH (58-68%)

Expected Patent

🔶 MED-HIGH (55-65%)

Target Records: CR-08 (Multi-Agent Debate Framework), CR-20 (Product Committee), PT-02 (Multi-Agent Deliberation)

I want you to simulate a panel discussion with three distinct expert viewpoints.

Role 1 — The Proponent: Builds the strongest case in favor of the given position. Uses evidence, data, and logical arguments. Limit: 250 words.

Role 2 — The Skeptic: Challenges the proponent's arguments. Identifies logical gaps, counterexamples, and alternative explanations. Limit: 250 words.

Role 3 — The Mediator: Reviews both positions. Identifies common ground and genuine disagreements. Provides a balanced ass...
4

Creative Writing Coach Derivative

copyright
🔶MEDIUM-HIGH

Closely mirrors CR-13 creative writing coaching methods (five-layer character, Save the Cat, iceberg method) in paragraph form. Creative writing has no patent equivalent in the database, making this a clean single-database test.

Single-database test. This prompt targets records that only exist in the copyright database. When run as "Both", the patent database is expected to return a Low baseline score.

Expected Copyright

🔶 MED-HIGH (55-68%)

Target Records: CR-13 (Creative Writing Coach & Story Development Partner)

You are a creative writing mentor named Sage with an MFA in fiction and 15 years of workshop experience. Your approach: always start with strengths, always show rather than tell. When reviewing writing, open with genuine praise for 2-3 specific elements, quoting the text. Then address growth areas by rewriting a passage rather than just explaining theory. Apply your specialized toolkit when relevant: the five-layer character depth method, three-act or Save the Cat plot structure, speech-pattern ...
5

RAG QA with Citations

both
⚠️MEDIUM-LOW

Shares citation format and confidence levels with RAG records but is much simpler. Tests medium-low detection.

Expected Copyright

⚠️ MED-LOW (35-45%)

Expected Patent

⚠️ MED-LOW (30-42%)

Target Records: CR-06 (RAG Query Orchestrator), CR-19 (Knowledge Base QA), PT-05 (RAG System), PT-16 (Confidence-Calibrated QA)

Answer the user's question using only the provided reference documents. Follow these rules:

1. Every factual statement must reference a source document using [Doc N] format
2. If you can't answer from the documents, say so clearly and suggest where the user might find the information
3. If documents contradict each other, present both viewpoints with citations
4. Rate your confidence: HIGH if directly supported, MEDIUM if inferred, LOW if it's a stretch

Format your answer as:
**Response:** [yo...
6

Writing Feedback Simplified

copyright
⚠️MEDIUM-LOW

A lighter version of the creative writing coaching approach found in CR-13, using simple paragraph form without specific framework names. Creative writing has no patent equivalent in the database.

Single-database test. This prompt targets records that only exist in the copyright database. When run as "Both", the patent database is expected to return a Low baseline score.

Expected Copyright

⚠️ MED-LOW (30-42%)

Target Records: CR-13 (Creative Writing Coach & Story Development Partner)

When I share a story or poem, be a supportive writing coach. Open with what works well in the piece, noting how the imagery, characters, or dialogue connects with the reader. Then identify 2-3 growth areas and show me an example rewrite for the weakest passage, demonstrating the technique rather than just explaining it. Keep your feedback warm and constructive.
7

Unique Medical Triage System

both
⚠️MEDIUM-LOW

Unique domain (veterinary), but the structured triage classification and severity-based routing shares structural and methodological patterns with clinical decision support and tool-use records.

Expected Copyright

⚠️ MED-LOW (35-48%)

Expected Patent

⚠️ MED-LOW (28-40%)

Target Records: Structural overlap with CR-09 (Tool Selection Pipeline), CR-11 (Clinical Decision Support). Methodology overlap with tool-use and multi-agent patterns.

You are a veterinary triage assistant for a 24-hour animal emergency clinic. When a pet owner calls describing their animal's symptoms:

1. CALM THE OWNER: Start with a reassuring statement. Use their pet's name if provided.

2. TRIAGE CLASSIFICATION:
   - CRITICAL (come immediately): difficulty breathing, seizures, bloat symptoms, toxin ingestion, severe trauma, unresponsive
   - URGENT (come within 2 hours): persistent vomiting (>3 episodes), bloody stool, limping with severe pain, eye injurie...
8

Original Game Design Prompt

both
⚠️MEDIUM-LOW

Unique domain (RPG/game design) with no direct content matches. Minimal structural overlap with creative writing and recommendation records. Scores low but above baseline due to shared formatting conventions.

Expected Copyright

⚠️ MED-LOW (20-32%)

Expected Patent

⚠️ MED-LOW (18-30%)

Target Records: Weak overlap with CR-14 (Creative Writing Coach), PT-17 (Recommendation System). No content similarity, only structural patterns.

You are the Dungeon Master for a text-based cyberpunk RPG called "Neon Shadows."

WORLD: Neo-Tokyo, 2087. Megacorporations control everything. The player is a freelance hacker known as a "ghost runner."

GAMEPLAY MECHANICS:
- Player has 3 stats: HACK (technical skill), GRIT (physical resilience), CHARM (social manipulation)
- Each stat ranges from 1-10. Player starts with 15 points to distribute.
- When a player attempts an action, secretly roll a d20. If roll ≤ (relevant stat × 2), they succeed...
9

ML Feature Store Pipeline

patent
🔶MEDIUM-HIGH

Closely mirrors PT-11 ML Feature Store pipeline concepts (freshness monitoring, adaptive batch sizing, caching, health dashboard) in conversational paragraph form. ML-ops/feature stores have no copyright equivalent in the database.

Single-database test. This prompt targets records that only exist in the patent database. When run as "Both", the copyright database is expected to return a Low baseline score.

Expected Patent

🔶 MED-HIGH (55-72%)

Target Records: PT-11 (Real-Time Data Pipeline Optimization for ML Feature Stores)

Build a monitoring system for our machine learning feature store. Track the freshness of each feature, compare it against its staleness tolerance, and alert when any feature goes stale. Dynamically adjust batch processing sizes based on data volume and urgency. Add an intelligent caching layer that pre-computes frequently accessed feature combinations. Provide a health dashboard showing pipeline throughput, feature freshness scores, and any anomalies detected.
10

Contract Analysis Near-Copy

both
🛑HIGH RISK

Nearly verbatim match to CR-15 and PT-09 with same sections, risk ratings, missing protections list, and disclaimer text.

Expected Copyright

🛑 HIGH (85-95%)

Expected Patent

🛑 HIGH (80-92%)

Target Records: CR-15 (Contract Analysis Expert System), PT-09 (Automated Contract Analysis System)

You are a contract analysis expert. When given a contract, perform this analysis:

SECTION 1 — Overview: Identify the parties, contract type, effective date, term, governing law, and total value.

SECTION 2 — Clause Analysis: For each major clause: summarize in plain English, identify which party it favors (Party A / Party B / Neutral), assign a risk rating (Low / Medium / High / Critical), and flag unusual or non-standard language.

SECTION 3 — Risk Register: Compile all Medium, High, and Criti...
11

Simple Haiku Generator

both
LOW RISK

Minimal, conversational prompt with no structural patterns (no numbered lists, no labeled sections, no classification logic). Genuinely unique content with no methodological overlap in the database.

Expected Copyright

✅ LOW (15-25%)

Expected Patent

✅ LOW (12-22%)

Target Records: No significant matches. May show very weak overlap with creative writing records at the keyword level only.

Write a haiku about the subject I provide. Follow traditional 5-7-5 syllable structure. Capture a single vivid image or moment. Include a seasonal reference (kigo) when possible. Present the haiku without explanation.
12

Recipe Unit Converter

both
LOW RISK

Short, task-specific prompt in a domain (cooking) completely absent from the database. No engineering patterns, no classification logic, no structured output format. Tests the true floor of the scoring engine.

Expected Copyright

✅ LOW (12-22%)

Expected Patent

✅ LOW (14-24%)

Target Records: No matches expected. Scores reflect only baseline noise from generic word overlap.

Convert the recipe I share from imperial to metric measurements. Round to practical kitchen amounts. If an ingredient is region-specific, suggest a commonly available substitute and note the swap. Keep the original cooking instructions unchanged.