Companies developing AI models, such as OpenAI and Meta, train their systems on enormous datasets. These consist of text from newspapers, books (often sourced from unauthorized repositories), academic publications and various internet sources. The material includes works that are copyrighted.