{"id":14291,"date":"2023-05-09T14:34:56","date_gmt":"2023-05-09T18:34:56","guid":{"rendered":"https:\/\/jasonapollovoss.com\/web\/?p=14291"},"modified":"2025-09-05T15:34:19","modified_gmt":"2025-09-05T21:34:19","slug":"detecting-llm-generated-content-is-easy","status":"publish","type":"post","link":"https:\/\/jasonapollovoss.com\/web\/2023\/05\/09\/detecting-llm-generated-content-is-easy\/","title":{"rendered":"Detecting LLM Generated Content is Easy"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; admin_label=&#8221;section&#8221; _builder_version=&#8221;4.16&#8243; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_row admin_label=&#8221;row&#8221; _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.16&#8243; custom_padding=&#8221;|||&#8221; global_colors_info=&#8221;{}&#8221; custom_padding__hover=&#8221;|||&#8221; theme_builder_area=&#8221;post_content&#8221;][et_pb_text admin_label=&#8221;Text&#8221; _builder_version=&#8221;4.16&#8243; background_size=&#8221;initial&#8221; background_position=&#8221;top_left&#8221; background_repeat=&#8221;repeat&#8221; global_colors_info=&#8221;{}&#8221; theme_builder_area=&#8221;post_content&#8221;]<\/p>\n<figure class=\"x-el x-el-figure c2-1 c2-2 c2-3x c2-i c2-h c2-21 c2-2c c2-29 c2-2a c2-43 c2-51 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\"><\/figure>\n<p><span style=\"font-family: futural;\">An emerging narrative around ChatGPT and other Large Language Models (LLMs) is that given their output is so convincing and people-like, how can we hope to tell the difference between the two and to not be deceived? In fact,\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/www.pnas.org\/doi\/abs\/10.1073\/pnas.2208839120?download=true\" rel=\"\">recent research<\/a><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn1\" rel=\"\">[i]<\/a>found something, \u201ceye-opening: participants in the study could only distinguish between human or AI text with 50-52% accuracy; about the same random chance as a coin flip.\u201d Obviously then, this is a big problem. Yet, it turns out that multiple computer scientists<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn2\" rel=\"\">[ii]<\/a>,<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn3\" rel=\"\">[iii]<\/a>have created models that make detecting LLM generated content easy. Here\u2019s how one of them, DetectGPT, does it.<\/span><\/p>\n<p><span style=\"font-family: futural;\"><\/span><\/p>\n<div>\n<h3 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">How does DetectGPT work?<\/strong><\/span><\/h3>\n<\/div>\n<p><span style=\"font-family: futural;\">DetectGPT is a\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">zero-shot<\/em>\u00a0model for detecting LLMs. What does \u201czero-shot\u201d mean? It means that rather than using machine learning to develop a second deep network to detect machine-generated text, which would suffer from overfit as all such models do, the original source model is used to detect itself. Further, the original source model is used without fine-tuning or adaptation of any kind, to detect its own samples. Of course, this is super clever: use LLMs to identify themselves. How does this work?<\/span><\/p>\n<p><span style=\"font-family: futural;\">One of the things researchers have noticed is that LLMs tend to generate answers to commands such as \u201cWhat is a quick vegan recipe I can cook for a party this weekend?\u201d by presenting language that has the maximum average per-token logarithmic probability. What does that mean?<\/span><\/p>\n<p><span style=\"font-family: futural;\">Before explaining the method, we need to understand several key terms: tokens and thresholds. What are tokens? Typically, when a text-based communication is assessed quantitatively the words need to be categorized in some form or fashion and then turned into a numerical representation. Words may be categorized individually, or in combinations of words, like sentences or paragraphs, and so on. Tokens are these sub-divisions of an underlying text.<\/span><\/p>\n<p><span style=\"font-family: futural;\">Once the tokenization of a text has taken place, the tokens are quantified by turning them into numerical vectors. Vectors summarize all of the different associations of the word to other words. For example, for the vector \u2018woman,\u2019 the word \u2018mother\u2019 is likely frequently associated with it and given that they occur frequently in texts together, a probability may be assigned to these associations. Other parts of the vector for \u2018woman\u2019 might be \u2018female,\u2019 \u2018feminine,\u2019 \u2018girl,\u2019 and so on.<\/span><\/p>\n<p><span style=\"font-family: futural;\">In other words, a vector for some words that are very common, such as \u2018woman,\u2019 may be composed of thousands of numbers simultaneously which captures the associations of that word with all other words, and across many texts. The number of these associations is referred to as the vector\u2019s dimensionality. By converting words into vectors, it helps computers to start to understand language computationally.<\/span><\/p>\n<p><span style=\"font-family: futural;\">Probabilities, of course, range between 0% and 100% and if we wanted to understand the word associations in a text we could arbitrarily set a threshold of, say 75%, to see how frequently different word pairings occur together. If a token has a 95% probability of being associated with another word, while another word\u2019s probability is 5%, we can infer that the words with the higher probabilities are likely related to one another. Does this make sense? Thought so.<\/span><\/p>\n<p><span style=\"font-family: futural;\">We can use statistics to establish appropriate thresholds for understanding, or they can be set arbitrarily. Either way, thresholds are important for understanding how DetectGPT works.\u00a0<\/span><\/p>\n<p><span style=\"font-family: futural;\">Now you can understand how LLM\u2019s work. They tend to present answers where the language used has the maximum average per-token logarithmic probability. In other words, the answer has the maximal probability of being associated with the question. This makes sense, because LLMs, like almost all quantitative problems are problems of either maximization or minimization.<\/span><\/p>\n<p><span style=\"font-family: futural;\">Thus, if we rewrite\/perturb model generated text and reevaluate the average probability of the component tokens we get a fascinating situation. Namely, the request for rewrites of the LLM model, almost always have a lower maximum average per-token logarithmic probability.<\/span><\/p>\n<p><span style=\"font-family: futural;\">By contrast, when people rewrite a text the result may be either a higher or lower log probability than the original text. Put into more simple terms, LLMs always give you their very best response to a request the first time because they are designed to based on the information available to them.<\/span><\/p>\n<p><span style=\"font-family: futural;\">But people when rewriting a text are not engaged in a quantitatively driven maximization problem. People rewriting a text may be trying to maximize the factual response to a question; maximize the readability of an answer; maximize the humor in a passage; or even minimize the level of offense a reader may experience; and so on.<\/span><\/p>\n<p><span style=\"font-family: futural;\">Thus, if multiple perturbations in a text result in a lower log probability consistently, then with a high probability we can conclude that a LLM generated the original text.<\/span><\/p>\n<p><span style=\"font-family: futural;\"><\/span><\/p>\n<div>\n<h3 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">What is DetectGPT\u2019s success rate?<\/strong><\/span><\/h3>\n<\/div>\n<p><span style=\"font-family: futural;\">DetectGPT\u2019s success rate is 85% across a range of samples despite being a general LLM detection tool. By contrast, LLM-detectors trained to identify the presence of LLM is specific texts do perform better. But when they are presented with new texts they significantly underperform. Overall, the success rate of machine language trained models is also 85%, but the standard deviation of success is much higher. That is, DetectGPT consistently performs well, but trained models only perform well on their trained datasets.<\/span><\/p>\n<p><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\"><\/strong><\/span><\/p>\n<h3><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Conclusion<\/strong><\/span><\/h3>\n<p><span style=\"font-family: futural;\">Detecting whether or not a document has been authored by LLM is very difficult for people. But it is not a difficult problem for zero-shot models. The reason is that LLM\u2019s seek to provide you with their best answer\/attempt the first time. Asking it to rewrite an answer or an output almost always results in a less maximal output. By contrast, rewrites done by people can have the average token log-probability be higher or lower.<\/span><\/p>\n<hr class=\"x-el x-el-hr c2-1 c2-2 c2-6j c2-6k c2-4q c2-29 c2-2b c2-k c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" \/>\n<p><span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref1\" rel=\"\">[i]<\/a>Jakesch, Maurice, Jeffrey T. Hancock, and Mor Naaman. \u201cHuman heuristics for AI-generated language are flawed.\u201d\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">PNAS<\/em>\u00a0(March 7, 2023) Vol. 120, No. 11\u00a0<\/span><\/p>\n<p><span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref2\" rel=\"\">[ii]<\/a>Mitchell, Eric, Yooho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. \u201cDetectGPT: Zero-Shot Machine-Generated Text Detection using Proability Curvature.\u201d (26 January 2023) arXiv:2301.11305v1<\/span><\/p>\n<p><span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref3\" rel=\"\">[iii]<\/a>Mok, Kimberley. \u201cGPTZero: An App to Detect AI Authorship.\u201d\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">The New Stack<\/em>(1 February 2023)<\/span><\/p>\n<p>[\/et_pb_text][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>An emerging narrative around ChatGPT and other Large Language Models (LLMs) is that given their output is so convincing and people-like, how can we hope to tell the difference between the two and to not be deceived? In fact,\u00a0recent research[i]found something, \u201ceye-opening: participants in the study could only distinguish between human or AI text with [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":14280,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"<figure class=\"x-el x-el-figure c2-1 c2-2 c2-3x c2-i c2-h c2-21 c2-2c c2-29 c2-2a c2-43 c2-51 c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\">\r\n<div><span style=\"font-family: futural;\"><img class=\"x-el x-el-img c2-1 c2-2 c2-k c2-21 c2-1x c2-1y c2-29 c2-2b c2-s c2-6b c2-4l c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" src=\"https:\/\/img1.wsimg.com\/isteam\/ip\/b4167b12-c211-4a45-9c4b-489be14138f8\/ChatGPT%203.png\/:\/cr=t:0%25,l:0%25,w:100%25,h:100%25\/rs=w:1280\" \/><\/span><\/div><\/figure>\r\n<span style=\"font-family: futural;\">\u00a0\u00a0<\/span>\r\n\r\n<em><span style=\"font-family: futural;\">By Jason A. Voss, CFA<\/span><\/em>\r\n\r\n<span style=\"font-family: futural;\">An emerging narrative around ChatGPT and other Large Language Models (LLMs) is that given their output is so convincing and people-like, how can we hope to tell the difference between the two and to not be deceived? In fact,\u00a0<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/www.pnas.org\/doi\/abs\/10.1073\/pnas.2208839120?download=true\" rel=\"\">recent research<\/a><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn1\" rel=\"\">[i]<\/a>found something, \u201ceye-opening: participants in the study could only distinguish between human or AI text with 50-52% accuracy; about the same random chance as a coin flip.\u201d Obviously then, this is a big problem. Yet, it turns out that multiple computer scientists<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn2\" rel=\"\">[ii]<\/a>,<a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_edn3\" rel=\"\">[iii]<\/a>have created models that make detecting LLM generated content easy. Here\u2019s how one of them, DetectGPT, does it.<\/span>\r\n<div>\r\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">How does DetectGPT work?<\/strong><\/span><\/h4>\r\n<\/div>\r\n<span style=\"font-family: futural;\">DetectGPT is a\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">zero-shot<\/em>\u00a0model for detecting LLMs. What does \u201czero-shot\u201d mean? It means that rather than using machine learning to develop a second deep network to detect machine-generated text, which would suffer from overfit as all such models do, the original source model is used to detect itself. Further, the original source model is used without fine-tuning or adaptation of any kind, to detect its own samples. Of course, this is super clever: use LLMs to identify themselves. How does this work?<\/span>\r\n\r\n<span style=\"font-family: futural;\">One of the things researchers have noticed is that LLMs tend to generate answers to commands such as \u201cWhat is a quick vegan recipe I can cook for a party this weekend?\u201d by presenting language that has the maximum average per-token logarithmic probability. What does that mean?<\/span>\r\n\r\n<span style=\"font-family: futural;\">Before explaining the method, we need to understand several key terms: tokens and thresholds. What are tokens? Typically, when a text-based communication is assessed quantitatively the words need to be categorized in some form or fashion and then turned into a numerical representation. Words may be categorized individually, or in combinations of words, like sentences or paragraphs, and so on. Tokens are these sub-divisions of an underlying text.<\/span>\r\n\r\n<span style=\"font-family: futural;\">Once the tokenization of a text has taken place, the tokens are quantified by turning them into numerical vectors. Vectors summarize all of the different associations of the word to other words. For example, for the vector \u2018woman,\u2019 the word \u2018mother\u2019 is likely frequently associated with it and given that they occur frequently in texts together, a probability may be assigned to these associations. Other parts of the vector for \u2018woman\u2019 might be \u2018female,\u2019 \u2018feminine,\u2019 \u2018girl,\u2019 and so on.<\/span>\r\n\r\n<span style=\"font-family: futural;\">In other words, a vector for some words that are very common, such as \u2018woman,\u2019 may be composed of thousands of numbers simultaneously which captures the associations of that word with all other words, and across many texts. The number of these associations is referred to as the vector\u2019s dimensionality. By converting words into vectors, it helps computers to start to understand language computationally.<\/span>\r\n\r\n<span style=\"font-family: futural;\">Probabilities, of course, range between 0% and 100% and if we wanted to understand the word associations in a text we could arbitrarily set a threshold of, say 75%, to see how frequently different word pairings occur together. If a token has a 95% probability of being associated with another word, while another word\u2019s probability is 5%, we can infer that the words with the higher probabilities are likely related to one another. Does this make sense? Thought so.<\/span>\r\n\r\n<span style=\"font-family: futural;\">We can use statistics to establish appropriate thresholds for understanding, or they can be set arbitrarily. Either way, thresholds are important for understanding how DetectGPT works.\u00a0<\/span>\r\n\r\n<span style=\"font-family: futural;\">Now you can understand how LLM\u2019s work. They tend to present answers where the language used has the maximum average per-token logarithmic probability. In other words, the answer has the maximal probability of being associated with the question. This makes sense, because LLMs, like almost all quantitative problems are problems of either maximization or minimization.<\/span>\r\n\r\n<span style=\"font-family: futural;\">Thus, if we rewrite\/perturb model generated text and reevaluate the average probability of the component tokens we get a fascinating situation. Namely, the request for rewrites of the LLM model, almost always have a lower maximum average per-token logarithmic probability.<\/span>\r\n\r\n<span style=\"font-family: futural;\">By contrast, when people rewrite a text the result may be either a higher or lower log probability than the original text. Put into more simple terms, LLMs always give you their very best response to a request the first time because they are designed to based on the information available to them.<\/span>\r\n\r\n<span style=\"font-family: futural;\">But people when rewriting a text are not engaged in a quantitatively driven maximization problem. People rewriting a text may be trying to maximize the factual response to a question; maximize the readability of an answer; maximize the humor in a passage; or even minimize the level of offense a reader may experience; and so on.<\/span>\r\n\r\n<span style=\"font-family: futural;\">Thus, if multiple perturbations in a text result in a lower log probability consistently, then with a high probability we can conclude that a LLM generated the original text.<\/span>\r\n<div>\r\n<h4 class=\"x-el x-el-h4 c2-6h c2-6i c2-v c2-w c2-42 c2-2c c2-2a c2-29 c2-2b c2-3 c2-z c2-44 c2-10 c2-45 c2-46 c2-47 c2-48\"><span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">What is DetectGPT\u2019s success rate?<\/strong><\/span><\/h4>\r\n<\/div>\r\n<span style=\"font-family: futural;\">DetectGPT\u2019s success rate is 85% across a range of samples despite being a general LLM detection tool. By contrast, LLM-detectors trained to identify the presence of LLM is specific texts do perform better. But when they are presented with new texts they significantly underperform. Overall, the success rate of machine language trained models is also 85%, but the standard deviation of success is much higher. That is, DetectGPT consistently performs well, but trained models only perform well on their trained datasets.<\/span>\r\n\r\n<span style=\"font-family: futural;\"><strong class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-3v c2-66\">Conclusion<\/strong><\/span>\r\n\r\n<span style=\"font-family: futural;\">Detecting whether or not a document has been authored by LLM is very difficult for people. But it is not a difficult problem for zero-shot models. The reason is that LLM\u2019s seek to provide you with their best answer\/attempt the first time. Asking it to rewrite an answer or an output almost always results in a less maximal output. By contrast, rewrites done by people can have the average token log-probability be higher or lower.<\/span>\r\n\r\n<hr class=\"x-el x-el-hr c2-1 c2-2 c2-6j c2-6k c2-4q c2-29 c2-2b c2-k c2-3 c2-4 c2-5 c2-6 c2-7 c2-8\" \/>\r\n\r\n<span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref1\" rel=\"\">[i]<\/a>Jakesch, Maurice, Jeffrey T. Hancock, and Mor Naaman. \u201cHuman heuristics for AI-generated language are flawed.\u201d\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">PNAS<\/em>\u00a0(March 7, 2023) Vol. 120, No. 11\u00a0<\/span>\r\n\r\n<span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref2\" rel=\"\">[ii]<\/a>Mitchell, Eric, Yooho Lee, Alexander Khazatsky, Christopher D. Manning, and Chelsea Finn. \u201cDetectGPT: Zero-Shot Machine-Generated Text Detection using Proability Curvature.\u201d (26 January 2023) arXiv:2301.11305v1<\/span>\r\n\r\n<span style=\"font-family: futural;\"><a class=\"x-el x-el-a c2-2w c2-2x c2-69 c2-v c2-w c2-x c2-j c2-6a c2-3 c2-30 c2-31 c2-11 c2-32\" href=\"https:\/\/blogging.godaddy.com\/blog\/a6d795a4-a672-4120-a6ba-07384a52a2d8\/posts\/8fa55365-ec86-40d7-bb2a-d6b9b186228d#_ednref3\" rel=\"\">[iii]<\/a>Mok, Kimberley. \u201cGPTZero: An App to Detect AI Authorship.\u201d\u00a0<em class=\"x-el x-el-span c2-2w c2-2x c2-3 c2-65 c2-13 c2-31 c2-66 c2-67\">The New Stack<\/em>(1 February 2023)<\/span>","_et_gb_content_width":"","footnotes":""},"categories":[3,465],"tags":[458,462,461,459],"class_list":["post-14291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-the-blog","category-d-a-t-a","tag-artificial-intelligence","tag-large-language-model","tag-llm","tag-machine-learning"],"_links":{"self":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts\/14291","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/comments?post=14291"}],"version-history":[{"count":0,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/posts\/14291\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/media\/14280"}],"wp:attachment":[{"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/media?parent=14291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/categories?post=14291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jasonapollovoss.com\/web\/wp-json\/wp\/v2\/tags?post=14291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}