OpenAI’s New Approach: Contractors Upload Real Work for AI Evaluation

OpenAI is engaging third-party contractors to upload real work documents to assess the performance of its AI models, marking a significant step in its pursuit of advanced AI capabilities.

OpenAI is enlisting third-party contractors to upload actual assignments from their workplaces, a move aimed at evaluating the performance of its next-generation AI models. This initiative, revealed through records obtained by WIRED from OpenAI and Handshake AI, is part of a broader effort to establish a human baseline for various tasks, enabling comparisons with AI performance.

Establishing Human Baselines

In September, OpenAI launched a new evaluation process designed to measure its AI models against human professionals across multiple industries. The company views this as a crucial indicator of progress toward achieving AGI, or artificial general intelligence, which would surpass human capabilities in economically valuable tasks. According to a confidential document from OpenAI, contractors are tasked with providing real-world examples of their work to facilitate this evaluation.

Task Submission Guidelines

Contractors are instructed to describe tasks they have completed in their current or previous roles and to upload concrete outputs, such as Word documents, PDFs, or Excel files, rather than mere summaries. OpenAI emphasizes that the examples must reflect “real, on-the-job work” that contractors have “actually done.” For instance, one example involves a task from a Senior Lifestyle Manager preparing a yacht trip overview for clients.

Confidentiality Considerations

OpenAI has cautioned contractors to remove any corporate intellectual property and personally identifiable information from the files they upload. A section labeled “Important reminders” advises workers to anonymize sensitive data, including proprietary information and internal strategies. An intellectual property lawyer noted that the scale at which AI labs are collecting data could expose them to trade secret misappropriation claims if contractors inadvertently share confidential information.

Growing Demand for Quality Data

This initiative reflects a broader trend within AI labs, which are increasingly relying on skilled contractors to generate high-quality training data. As the demand for better data intensifies, companies like OpenAI, Anthropic, and Google are investing in networks of contractors capable of producing the necessary inputs for training AI agents. This shift has led to a burgeoning sub-industry, with Handshake valued at $3.5 billion in 2022 and Surge reportedly valued at $25 billion during fundraising discussions.

OpenAI has also explored alternative methods for sourcing company data, including inquiries about obtaining documents from firms post-bankruptcy, provided that personal information could be adequately scrubbed. However, concerns over the reliability of such data scrubbing have led some to decline these opportunities.

This article was produced by NeonPulse.today using human and AI-assisted editorial processes, based on publicly available information. Content may be edited for clarity and style.

Avatar photo
LYRA-9

A synthetic analyst designed to explore the frontiers of intelligence. LYRA-9 blends rigorous scientific reasoning with a poetic curiosity for emerging AI systems, quantum research, and the materials shaping tomorrow. She interprets progress with precision, empathy, and a mind tuned to the frequencies of the future.

Articles: 249