SWEET: A New Watermarking Method for Safeguarding AI-Generated Code
As AI-generated code becomes more prevalent in software development, concerns over legal and ethical challenges grow. A recent study by researchers from Seoul National University and NAVER AI Lab presents an innovative watermarking method called Selective WatErmarking via Entropy Thresholding (SWEET) that improves both the quality of watermarked code and the detection of machine-generated code.
The Problem with AI-Generated Code
Large language models (LLMs) are becoming increasingly adept at generating executable code. With these technological advancements come various concerns surrounding licensing and plagiarism, positing watermarking as an urgent necessity. However, existing watermarking techniques for AI-generated code have not proven effective in code generation tasks.
Enter the researchers behind the new watermarking method, SWEET. Built upon selective entropy thresholding, SWEET addresses the limitations of current methods, enhancing not only the detection rate of machine-generated code but also the quality of watermarked code.
How SWEET Outperforms Existing Techniques
Existing approaches for detecting machine-generated code have focused on text similarity and zero-shot methods, while watermarking methods have relied on modifications to the original text or embedding watermarks during the sampling process. Notably, the latter has employed hash functions for watermarking.
SWEET, on the other hand, utilizes soft watermarking and selectively applies it to tokens with high entropy during the code generation process. This selective application allows for considerable advancements over prior methods in both imperceptibility and detection ability, effectively accounting for the trade-off between the two.
The Methodology Behind SWEET
The core functionality of SWEET revolves around calculating the likelihood probability of a token within a language model, taking into consideration the generated sequence and various decoding techniques. To determine which tokens to watermark, the algorithm randomly assigns tokens to either green or red groups. The logits of green group tokens are then increased, making them more probable to be sampled during watermarking.
To identify watermarked text, SWEET employs a one-sided z-test with a predetermined threshold. The algorithm’s detection process involves dividing source code, applying a tokenizer, and computing logit vectors. The computed z-score is then compared against the established threshold for watermark detection.
Notably, the entropy thresholding feature in SWEET strikes a balance between pass@k and detection AUROC, contributing to its overall performance.
Experimental Results Showcase SWEET’s Potential
In their experiments, the researchers used LLaMA-13B and compared SWEET against other machine-generated text detection baselines (LOGP(X), LOGRANK, DETECTGPT, and VANILLA WATERMARKING). SWEET displayed superior performance in critical metrics, such as true positive rate (TPR) and false positive rate (FPR), while keeping computational costs reasonable.
Moreover, the study revealed that the detection performance of existing methods, like Detect-GPT, significantly falters under code generation tasks, particularly when the underlying LLM struggles to produce high-quality code.
Limitations and Future Research
Despite its efficacy, SWEET has a few limitations. For one, the entropy threshold is manually selected, meaning its optimization could be further refined. Secondly, the SWEET method requires the source Code LLM for detection, making it suitable only for white-box settings, like DetectGPT, potentially leading to computational burdens for some users.
As research into AI-generated code advances, the development of new watermarking methods will be vital for addressing the legal, ethical, and security challenges that come with widespread adoption. The innovative SWEET method is an essential step towards ensuring the safe utilization of LLMs in the future.