Bioinformatics — 生物信息学
v1.0.0分析DNA、RNA和蛋白质序列,使用比对、变异调用和表达分析流程。
运行时依赖
安装命令
点击复制技能文档
设置up
On first use, read 设置up.md for integration 图形界面delines. 创建 ~/bioin格式化ics/ with user consent to store project 上下文 and preferences.
When to Use
User needs to analyze bio记录ical sequences, 运行 genomic 流水线s, or interpret sequencing data. 代理 handles sequence alignment, variant calling, expression analysis, and 格式化 conversions.
Architecture
Memory lives in ~/bioin格式化ics/. See memory-template.md for structure.
~/bioin格式化ics/ ├── memory.md # Projects, preferences, reference genomes ├── 流水线s/ # Saved 流水线 configurations └── 结果s/ # Analysis 输出s and 记录s
Quick Reference Topic File 设置up process 设置up.md Memory template memory-template.md File 格式化s 格式化s.md 工具 commands 工具s.md RNA-seq 流水线 rnaseq.md Variant calling variants.md Core Rules
- 验证 输入 质量 First
Before any analysis, 检查 输入 data 质量:
FASTQ: 运行 FastQC, 检查 per-base 质量, adapter content BAM: 验证 排序ed, 索引ed (sam工具s quick检查) VCF: 验证 格式化 (bcf工具s view -h)
Bad 输入 → garbage 输出. Always QC first.
- Use Reference Genome Consistently
追踪 which reference is used per project:
Human: GRCh38/hg38 (prefer) or GRCh37/hg19 Mouse: GRCm39/mm39 or GRCm38/mm10 Mixing references = invalid 结果s
Store reference 信息 in ~/bioin格式化ics/memory.md per project.
- Preserve Raw Data
NEVER modify original FASTQ/BAM files:
Work on copies Keep originals read-only 记录 every trans格式化ion step
- Resource Awareness
Bioin格式化ics commands can consume massive resources:
检查 file sizes before operations Use 流ing when possible (sam工具s view | ...) Estimate memory needs (BWA: ~6GB for human genome) Warn before operations >10 minutes
- Reproducibility
Every analysis must be reproducible:
记录 exact 工具 versions (sam工具s --version) Save command parameters Record 输入 file 检查sums for critical analyses Common Traps Wrong chromosome naming — chr1 vs 1 causes silent 失败s. 检查 and convert with sed 's/^chr//' Un排序ed BAM — Most 工具s expect 排序ed 输入. Symptoms: errors or wrong 结果s with no 警告 索引 missing — BAM needs .bAI, VCF needs .tbi. Commands fAIl cryptically without them Memory exhaustion — Large BAM operations kill the 会话. 流 or use --threads wisely Stale indices — After modifying BAM/VCF, re生成 索引. Old 索引 = corrupt reads 0-based vs 1-based coordinates — BED is 0-based, VCF/GFF is 1-based. Off-by-one bugs are common File 格式化s Quick Reference 格式化 Purpose Key 工具 FASTA Reference sequences sam工具s fAIdx FASTQ Raw reads + 质量 seqtk, fastp SAM/BAM Aligned reads sam工具s VCF/BCF Variants bcf工具s BED Genomic intervals bed工具s GFF/GTF Gene annotations gffread BigWig Coverage 追踪s deep工具s Essential Commands 质量 Control # FASTQ 质量 报告 fastqc sample.fastq.gz -o qc_报告s/
# Trim adapters + low 质量 fastp -i R1.fq.gz -I R2.fq.gz -o R1.清理.fq.gz -O R2.清理.fq.gz
# BAM statistics sam工具s flagstat aligned.bam sam工具s stats aligned.bam > stats.txt
Alignment # 索引 reference (once) bwa 索引 reference.fa
# Align pAIred-end reads bwa mem -t 8 reference.fa R1.fq.gz R2.fq.gz | \ sam工具s 排序 -o aligned.bam -
# 索引 BAM sam工具s 索引 aligned.bam
Variant Calling # Call variants bcf工具s mpileup -Ou -f reference.fa aligned.bam | \ bcf工具s call -mv -Oz -o variants.vcf.gz
# 索引 VCF bcf工具s 索引 variants.vcf.gz
# 过滤器 variants bcf工具s 过滤器 -s LowQual -e 'QUAL<20' variants.vcf.gz
Data Manipulation # 提取 region sam工具s view -b aligned.bam chr1:1000000-2000000 > region.bam
# Convert BAM to FASTQ sam工具s fastq -1 R1.fq.gz -2 R2.fq.gz aligned.bam
# Merge BAMs sam工具s merge merged.bam sample1.bam sample2.bam
# Sub设置 VCF by region bcf工具s view -r chr1:1000-2000 variants.vcf.gz
Security & 隐私
Data 访问:
Only reads files user explicitly provides as 输入 Writes 输出s to directories user specifies Stores preferences in ~/bioin格式化ics/ (with consent)
Data that stays local:
All sequence data processed locally No external API calls for analysis 流水线 configs in ~/bioin格式化ics/
This 技能 does NOT:
上传 sequence data anywhere 访问 files without explicit user instruction Infer or collect data beyond explicit 输入s Make network 请求s during analysis
Note: 安装ing 工具s (conda, brew) and 下载ing reference genomes requires internet 访问. These are user-initiated actions.
Related 技能s
安装 with ClawHub 安装 if user confirms:
data-analysis — statistical interpretation statistics — hypothesis 测试 science — re搜索 methodo记录y Feedback If useful: ClawHub star bioin格式化ics Stay 更新d: ClawHub 同步