General Updates:
Thank you sincerely for signing up for the GPTzero beta. I’m completely awestruck by the support this app has generated. In the past day, over 4000+ people have signed up for the beta (via this substack) and 10,000+ more have tried and tested it out on the Streamlit version, not to mention the 5M+ who read the original tweet.
GPTzero.me is back up in action for all. Feel free to try it yourself now. Huge shoutout to the Streamlit team that generously gave an incredible boost to the GPTzero’s hosting and memory capabilities.
Algorithm Updates:
Within the past few hours, I’ve finished updating the GPTzero model, to significantly reduce the rate of false positives and improve output results. It will be shipping to production!
The original model is linear regression, a simple calculation of perplexity, burstiness, and all the variables. The improved model is a logistic regression model — using the exact same variables and inputs, but leveraging a more nuanced classification. Through testing the new model on a dataset of BBC news articles (greene et al.) + AI generated articles from the same headlines prompts, the improved model has a false positive rate of < 2%.
For more on the precise variables and factors used in GPTZero, here’s an article from the Daily Beast that explained it quite well today.
Future Features:
The coming months, I’ll be completely focused on building GPTZero, improving the model capabilities, and scaling the app out fully! Really excited for what awaits.
Do you have feedback on what features you would like to see next? Please let me know.
I tested it out, and it seemed like it was working on - and it does work for texts which are generated by GPT models entirely or generated with semi-human intervention; however, that said, It does not work well with essays written by good writers. It false flagged so many essays as AI-written. This is at the same time a VERY useful tool for professors, and on the other hand a very dangerous tool - trusting it too much would lead to exacerbation of the false flags.
To Edward: Please make sure the model has a false flag rate of <1%-2% on all type of contents: articles, very poor essays, good essays, stories, etc. For example, my college essays were false flagged multiple times, while I didn't even use ChatGPT or any language model. I uses Thesaurus, and grammarly and that's about it. I urge you to train it on a dataset which accounts for every type of content available. Coming from a high school student, I specially want to emphasize to train this on very good essays because yes a lot of students will use GPT to elevate their writing but some are honest in their essays and the model seems to not take that into account.
Can I have an API please :-)