用于检测和转换 AI 生成文本以绕过检测器的综合 CLI 工具。基于 维基百科 AI 写作特征指南。
快速开始
# 检测 AI 模式
python scripts/detect.py text.txt# 转换为人类风格
python scripts/transform.py text.txt -o clean.txt
# 对比前后效果
python scripts/compare.py text.txt -o clean.txt
检测类别
分析器检查维基百科指南中的 16 个模式类别:
严重(立即被识别为 AI)
| 类别 | 示例 |
|---|
| 引用缺陷 | oaicite、turn0search、contentReference |
| 知识截止 | "截至我上次训练"、"根据已有信息" |
| 聊天机器人痕迹 | "希望这有帮助"、"好问题!"、"作为 AI" |
| Markdown 格式 | 粗体、## 标题、`` 代码块 ` |
高信号
| 类别 | 示例 |
|---|
| AI 词汇 | delve、tapestry、landscape、pivotal、underscore、foster |
| 重要性夸大 | "作为见证"、"关键时刻"、"不可磨灭的印记" |
| 推销式语言 | vibrant、groundbreaking、nestled、breathtaking |
| 系动词回避 | 用 "serves as" 代替 "is",用 "boasts" 代替 "has" |
中等信号
| 类别 | 示例 |
|---|
| 表面化 -ing | "highlighting the importance"、"fostering collaboration" |
| 填充短语 | "in order to"、"due to the fact that"、"Additionally," |
| 模糊归因 | "专家认为"、"行业报告表明" |
| 挑战公式 | "尽管面临这些挑战"、"未来展望" |
风格信号
| 类别 | 示例 |
|---|
| 弯引号 | "" 而非 ""(ChatGPT 标志) |
| 破折号滥用 | 过度使用 — 来强调 |
| 否定平行结构 | "Not only... but also"、"It's not just... it's" |
| 三段式规则 | 强制三连如 "创新、灵感与洞察" |
脚本
detect.py — 扫描 AI 模式
python scripts/detect.py essay.txt
python scripts/detect.py essay.txt -j # JSON 输出
python scripts/detect.py essay.txt -s # 仅评分
echo "text" | python scripts/detect.py
输出:
- 问题数量和字数
- AI 概率(低/中/高/极高)
- 按类别细分
- 标记可自动修复的模式
transform.py — 重写文本
python scripts/transform.py essay.txt
python scripts/transform.py essay.txt -o output.txt
python scripts/transform.py essay.txt -a # 激进模式
python scripts/transform.py essay.txt -q # 安静模式
自动修复:
- 引用缺陷(oaicite、turn0search)
- Markdown 格式(、##、`
)
- 聊天机器人语句
- 系动词回避 → "is/has"
- 填充短语 → 简化形式
- 弯引号 → 直引号
激进模式 (-a):
compare.py — 前后对比分析
python scripts/compare.py essay.txt
python scripts/compare.py essay.txt -a -o clean.txt
并排显示转换前后的检测评分
工作流程
扫描检测风险:
python scripts/detect.py document.txt
转换并对比:
python scripts/compare.py document.txt -o document_v2.txt
验证改进效果:
python scripts/detect.py document_v2.txt -s
人工审核* AI 词汇和推销式语言(需要人工判断)
AI 概率评分
| 等级 | 标准 |
|---|
| 极高 | 存在引用缺陷、知识截止或聊天机器人痕迹 |
| 高 | >30 个问题 或 >5% 问题密度 |
| 中 | >15 个问题 或 >2% 问题密度 |
| 低 | <15 个问题 且 <2% 密度 |
自定义模式
编辑 scripts/patterns.json 来添加/修改:
ai_vocabulary — 需标记的词汇
significance_inflation — 夸大性短语
promotional_language — 营销用语
copula_avoidance — 短语 → 替换
filler_replacements — 短语 → 简化形式
chatbot_artifacts` — 触发整句删除的短语
批量处理
# 扫描所有文件
for f in .txt; do
echo "=== $f ==="
python scripts/detect.py "$f" -s
done# 转换所有 Markdown 文件
for f in *.md; do
python scripts/transform.py "$f" -a -o "${f%.md}_clean.md" -q
done
参考资料
基于维基百科的 AI 写作特征,由 WikiProject AI Cleanup 维护。模式来源于数千个 AI 生成文本样本的记录。
核心洞察:"大语言模型使用统计算法来猜测接下来应该出现什么内容。结果倾向于适用于最广泛情况的最具统计可能性的结果。"
Comprehensive CLI for detecting and transforming AI-generated text to bypass detectors. Based on Wikipedia's Signs of AI Writing.
Quick Start
# Detect AI patterns
python scripts/detect.py text.txt# Transform to human-like
python scripts/transform.py text.txt -o clean.txt
# Compare before/after
python scripts/compare.py text.txt -o clean.txt
Detection Categories
The analyzer checks for 16 pattern categories from Wikipedia's guide:
Critical (Immediate AI Detection)
| Category | Examples |
|---|
| Citation Bugs | oaicite, turn0search, contentReference |
| Knowledge Cutoff | "as of my last training", "based on available information" |
| Chatbot Artifacts | "I hope this helps", "Great question!", "As an AI" |
| Markdown | bold, ## headers, `` code blocks ` |
High Signal
| Category | Examples |
|---|
| AI Vocabulary | delve, tapestry, landscape, pivotal, underscore, foster |
| Significance Inflation | "serves as a testament", "pivotal moment", "indelible mark" |
| Promotional Language | vibrant, groundbreaking, nestled, breathtaking |
| Copula Avoidance | "serves as" instead of "is", "boasts" instead of "has" |
Medium Signal
| Category | Examples |
|---|
| Superficial -ing | "highlighting the importance", "fostering collaboration" |
| Filler Phrases | "in order to", "due to the fact that", "Additionally," |
| Vague Attributions | "experts believe", "industry reports suggest" |
| Challenges Formula | "Despite these challenges", "Future outlook" |
Style Signal
| Category | Examples |
|---|
| Curly Quotes | "" instead of "" (ChatGPT signature) |
| Em Dash Overuse | Excessive use of — for emphasis |
| Negative Parallelisms | "Not only... but also", "It's not just... it's" |
| Rule of Three | Forced triplets like "innovation, inspiration, and insight" |
Scripts
detect.py — Scan for AI Patterns
python scripts/detect.py essay.txt
python scripts/detect.py essay.txt -j # JSON output
python scripts/detect.py essay.txt -s # score only
echo "text" | python scripts/detect.py
Output:
- Issue count and word count
- AI probability (low/medium/high/very high)
- Breakdown by category
- Auto-fixable patterns marked
transform.py — Rewrite Text
python scripts/transform.py essay.txt
python scripts/transform.py essay.txt -o output.txt
python scripts/transform.py essay.txt -a # aggressive
python scripts/transform.py essay.txt -q # quiet
Auto-fixes:
- Citation bugs (oaicite, turn0search)
- Markdown (, ##, `
)
- Chatbot sentences
- Copula avoidance → "is/has"
- Filler phrases → simpler forms
- Curly → straight quotes
Aggressive (-a):
- Simplifies -ing clauses
- Reduces em dashes
compare.py — Before/After Analysis
python scripts/compare.py essay.txt
python scripts/compare.py essay.txt -a -o clean.txt
Shows side-by-side detection scores before and after transformation
Workflow
Scan for detection risk:
python scripts/detect.py document.txt
Transform with comparison:
python scripts/compare.py document.txt -o document_v2.txt
Verify improvement:
python scripts/detect.py document_v2.txt -s
Manual review* for AI vocabulary and promotional language (requires judgment)
AI Probability Scoring
| Rating | Criteria |
|---|
| Very High | Citation bugs, knowledge cutoff, or chatbot artifacts present |
| High | >30 issues OR >5% issue density |
| Medium | >15 issues OR >2% issue density |
| Low | <15 issues AND <2% density |
Customizing Patterns
Edit scripts/patterns.json to add/modify:
ai_vocabulary — words to flag
significance_inflation — puffery phrases
promotional_language — marketing speak
copula_avoidance — phrase → replacement
filler_replacements — phrase → simpler form
chatbot_artifacts` — phrases triggering sentence removal
Batch Processing
# Scan all files
for f in .txt; do
echo "=== $f ==="
python scripts/detect.py "$f" -s
done# Transform all markdown
for f in *.md; do
python scripts/transform.py "$f" -a -o "${f%.md}_clean.md" -q
done
Reference
Based on Wikipedia's Signs of AI Writing, maintained by WikiProject AI Cleanup. Patterns documented from thousands of AI-generated text examples.
Key insight: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."