CPG at ICML 2024

Matthieu Meeus and Igor Shilov introduced their paper Copyright Traps for Large Language Models at ICML 2024 in Vienna. The authors proposed the use of copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. By performing large-scale experiments on a real-world 1.3B-parameter LLM, the authors confirmed that copyright traps enable pre-training data detection even in a model where existing methods fail.