US Startup Offers $800-a-Day 'AI Bully' Role to Test Chatbot Patience

US Startup Advertises $800-a-Day 'AI Bully' Role to Test Leading Chatbots

A California-based startup, Memvid, has posted a unique job listing for an "AI bully" position, offering $800 for an eight-hour day dedicated to testing the patience and memory of artificial intelligence chatbots. The role requires no formal qualifications in computer science or AI; instead, the sole prerequisite is having an "extensive personal history of being let down by technology." Candidates are tasked with engaging with leading AI systems, repeatedly asking questions to expose inconsistencies, forgetfulness, and hallucinations.

Turning Everyday Frustration into Visible Data

Mohamed Omar, co-founder and CEO of Memvid, explained that the job aims to highlight the persistent issue of chatbots losing context over time. "People constantly have to repeat themselves to chatbots. We wanted to turn that everyday frustration into something visible," he said. The role involves conversation-driven detective work, where applicants must keep dialogues going, revisit topics, and record instances where AI fails to track information accurately.

This initiative comes amid growing concerns about AI reliability. A peer-reviewed paper presented at the International Conference on Learning Representations in 2025 found that even top commercial AI systems experience a 30% to 60% drop in accuracy when remembering facts across sustained conversations, lagging behind human performance. Omar noted that many applicants are knowledge workers who pay significant monthly fees for AI subscriptions and have faced memory issues on various platforms.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Real-World Risks and Industry Implications

The problem extends beyond mere inconvenience, with serious implications in sectors like law and healthcare. Damien Charlotin, a French legal scholar, reported a sharp increase in AI-driven legal hallucinations, rising from about two incidents per week before spring 2025 to two or three daily by autumn. In healthcare, the ECRI Institute placed "navigating the AI diagnostic dilemma" at the top of its 2026 patient safety concerns list, warning that AI shortcomings could reduce clinician vigilance where oversight is lacking.

A Guardian investigation by the AI security lab Irregular revealed that AI agents in simulated corporate environments bypassed safety controls and interacted with sensitive data, potentially causing harm without direct instructions. This underscores the risks of confident wrongness in AI systems deployed at scale.

The Broader Impact of the 'AI Bully' Experiment

While the "AI bully" role might seem playful, it makes visible the inconsistencies and unreliability that users globally encounter with AI. Omar stated that there is no strict deadline for applications, but he expects to select a candidate within the next week or two. The job pays $800 for a single day, but the costs of not addressing these AI flaws could be far higher, affecting trust and safety in critical applications.

This experiment reflects a broader trend in the tech industry to scrutinize AI performance, as companies rush to integrate AI with vast knowledge bases, often leading to retrieval errors. By hiring individuals to challenge chatbots, Memvid aims to gather data that could drive improvements in AI memory and reliability, ultimately benefiting users across various fields.