How Originality.AI Operates
- Originality.AI uses machine learning (ML) to differentiate between AI-generated and human-written text.
- Our system employs a custom version of BERT designed for classification. Our engineers found that a more adaptable discriminative model architecture enhances detection capabilities compared to a generative model.
- The language model of our tool was developed with a novel architecture using 160GB of text data, undergoing a dual-phase training regime that included both generator and discriminator models, with the latter being pivotal for language modeling.
- Our training data was meticulously produced through various sampling techniques and human verification. To refine text generation and boost accuracy, our team implemented methods such as temperature control, Top-K, and nucleus sampling.
- For more insights into our technology and training processes, visit our blog.
Interpretation of Originality.AI Scores
- Understanding the distinction between Plagiarism Detection and AI detection is crucial, as they require different approaches.
- Plagiarism detection is straightforward and has been practiced online for years, where any copied text segment indicates plagiarism.
- AI detection, however, is more complex and subjective. For example, a 5% AI score with a 95% Human score shows a 95% likelihood of human authorship, not that 5% of the text is AI-generated.
- When evaluating Originality.AI scores, consider:
- The AI versus Human score represents the likelihood, according to our AI, of whether the content was AI-generated or human-written.
- For instance, a 10% AI and 90% Human score implies a 90% probability of human origin and 10% of AI creation, not that 90% of the text is human and 10% AI-generated.
- Publishers often use a high human score to denote originality, even with human-authored content.
- While our tool boasts over 94% accuracy on GPT-3, GPT-3.5, and ChatGPT, it’s not infallible, and some inaccuracies occur.
- It’s more reliable to evaluate a series of articles for assessing a writer or service than to judge based on a single piece.
- Article length also influences the accuracy of the scores.
So, what should your threshold be?
Here are some suggested benchmarks for different users:
Strategies & Recommended Thresholds
- Recommended threshold for ensuring purely human-generated content:
- Human Average: Above 90%
- Human Minimum: 65
- Recommended threshold for sites allowing AI-assisted research:
- Human Average: 75%
- Human Minimum: 50%
- Recommended threshold for those using AI to enhance efficiency but wary of AI detection by search engines:
- Human Average: 60%
- Human Minimum: 50%
- Recommended threshold for sites using AI for content creation with editorial oversight:
- No specific target
- A minimum of 0% Human is acceptable