AI Writing Is Improving, But It Still Can't Match Human Creativity

Saturday December 21, 2024. 02:00 PM , from Slashdot

sciencehabit shares a report from Science Magazine: With a few keystrokes, anyone can ask an artificial intelligence (AI) program such as ChatGPT to write them a term paper, a rap song, or a play. But don't expect William Shakespeare's originality. A new study finds such output remains derivative -- at least for now. [O]bjectively testing this creativity has been tricky. Scientists have generally taken two tacks. One is to use another computer program to search for signs of plagiarism -- though a lack of plagiarism does not necessarily equal creativity. The other approach is to have humans judge the AI output themselves, rating factors such as fluency and originality. But that's subjective and time intensive. So Ximing Lu, a computer scientist at the University of Washington, and colleagues created a program featuring both objectivity and a bit of nuance.

Called DJ Search, it collects pieces of text of a minimum length from whatever the AI outputs and searches for them in large online databases. DJ Search doesn't just look for identical matches; it also scans for strings whose words have similar meanings. To evaluate the meaning of a word or phrase, the program itself relies on a separate AI algorithm that produces a set of numbers called an 'embedding,' which roughly represents the contexts in which words are typically found. Synonymous words have numerically close embeddings. For example, phrases that swap 'anticipation' and 'excitement' are considered matches. After removing all matches, the program calculates the ratio of the remaining words to the original document length, which should give an estimate of how much of the AI's output is novel. The program conducts this process for various string lengths (the study uses a minimum of five words) and combines the ratios into one index of linguistic novelty. (The team calls it a 'creativity index,' but creativity requires both novelty and quality -- random gibberish is novel but not creative.)

The researchers compared the linguistic novelty of published novels, poetry, and speeches with works written by recent LLMs. Humans outscored AIs by about 80% in poetry, 100% in novels, and 150% in speeches, the researchers report in a preprint posted on OpenReview and currently under peer review. Although DJ Search was designed for comparing people and machines, it can also be used to compare two or more humanmade works. For example, Suzanne Collins's 2008 novel The Hunger Games scored 35% higher in linguistic originality than Stephenie Meyer's 2005 hit Twilight. (You can try the tool online.)

Read more of this story at Slashdot.