Llm Data Automation — LLM 数据自动化 — 使用 LLM 自动化建筑数据处理

Name: Llm Data Automation — LLM 数据自动化 — 使用 LLM 自动化建筑数据处理
Author: datadrivenconstruction

datadrivenconstruction

Llm Data Automation — LLM 数据自动化 — 使用 LLM 自动化建筑数据处理

v2.1.0

利用 LLM（如 ChatGPT、Claude、LLaMA）自动化建筑数据处理。生成 Python/Pandas 脚本，提取文档数据，创建自动化管道，无需深入编程知识。

0· 832·0 当前·0 累计

by @datadrivenconstruction·MIT-0

自动化 AI模型访问数据分析浏览器自动化

下载技能包

License

MIT-0

最后更新

2026/2/16

安全扫描

VirusTotal

无害

查看报告

OpenClaw

安全

high confidence

该技能内部逻辑一致，仅作为指令式助手生成建筑数据的 Python/Pandas 管道，仅请求 Python 和文件系统访问，不要求未解释的凭据或执行可疑操作。

评估建议

["注意操作系统限制：元数据列出 win32；确保与您的环境匹配。","该技能预计使用 python3 和文件系统访问；确保在安全环境（virtualenv/容器）中运行生成的脚本，并在执行前审查 LLM 生成的代码——LLM 可能产生有 bug 或不安全的命令。","SKILL.md 建议使用在线 LLM 服务（ChatGPT/Claude）或安装本地工具（Ollama、LM Studio）。除非信任提供者并理解数据保留政策，否则避免将敏感或专有项目数据发送到在线模型。","如果安装第三方 LLM 运行时，请仅从官方网站下载，并在可能的情况下验证校验和。","该技能不请求 API 密钥或其他凭据，降低了数据外泄风险；然而，监控任何提示您将凭据粘贴到工具或聊天中的提示。如果您想要更严格的安全姿态，请在隔离的 VM/容器中运行技能的工作流，并在运行之前审查所有生成的脚本。"]...

详细分析 ▾

✓ 用途与能力

名称/描述与指令和清单匹配：示例显示生成 Python/Pandas 代码、提取 PDF、处理 CSV/Excel/BIM 导出。声明的要求（python3）和文件系统权限适合文件处理和代码生成技能。

✓ 指令范围

SKILL.md 指示代理收集用户提供的数据/文件，生成或运行 Python 代码（pandas、pdfplumber），并可选使用本地 LLM（Ollama/LM Studio）或在线 LLM。所有引用的操作都在声明的目的范围内，仅限于用户提供的数据；没有指令读取无关的系统文件或向隐藏端点外泄数据。

✓ 安装机制

没有安装规格（仅指令）。该技能推荐第三方工具（Ollama、LM Studio），但不绑定下载或运行安装程序。只要用户从官方源安装这些工具，这就是低风险的。

✓ 凭证需求

该技能不请求环境变量或凭据。仅声明了 claw.json 中的文件系统访问权限，适用于读写用户提供的数据文件。没有请求无关的秘密或配置路径。

✓ 持久化与权限

always 为 false，该技能可由用户调用，允许正常的自动调用。它不请求持久的特殊权限，也不尝试修改其他技能或系统范围的设置。

安全有层次，运行前请审查代码。

License

MIT-0

可自由使用、修改和再分发，无需署名。

查看条款 ↗

运行时依赖

🖥️ OSWindows

版本

latestv2.1.02026/2/16

● 无害

安装命令点击复制

官方npx clawhub@latest install llm-data-automation

镜像加速npx clawhub@latest install llm-data-automation --registry https://cn.clawhub-mirror.com

技能文档

概述

基于 DDC 方法论（第 2.3 章），此技能支持使用大型语言模型（LLM）实现建筑数据处理的自动化。无需手动编写数据转换代码，您只需用自然语言描述需求，LLM 即可生成必要的 Python/Pandas 代码。

书籍参考： 《Pandas DataFrame и LLM ChatGPT》/《Pandas DataFrame and LLM ChatGPT》

"LLM 模型（如 ChatGPT 和 LLaMA）使没有深厚编程知识的专家能够为公司业务流程的自动化和改进做出贡献。"
— DDC 书籍，第 2.3 章

快速入门

选项 1：使用在线 ChatGPT/Claude

用自然语言描述您的数据处理任务：

提示词："编写 Python 代码，读取包含建筑材料的 Excel 文件，筛选 quantity > 100 的行，并保存为 CSV。"

选项 2：运行本地 LLM（Ollama）

# 从 ollama.com 安装 Ollama
ollama pull mistral  # 运行查询
ollama run mistral "编写 Pandas 代码，从 quantity  unit_price 计算总成本"

选项 3：使用 LM Studio（图形界面）

从 lmstudio.ai 下载

安装并选择模型（如 Mistral、LLaMA）

开始与本地 AI 聊天

核心概念

DataFrame 作为通用格式

import pandas as pd
# 建筑项目作为 DataFrame
# 行 = 元素，列 = 属性
df = pd.DataFrame({
    'element_id': ['W001', 'W002', 'C001'],
    'category': ['Wall', 'Wall', 'Column'],
    'material': ['Concrete', 'Brick', 'Steel'],
    'volume_m3': [45.5, 32.0, 8.2],
    'cost_per_m3': [150, 80, 450]
})# 计算总成本
df['total_cost'] = df['volume_m3']  df['cost_per_m3']
print(df)

建筑任务的 LLM 提示词

数据导入：

"编写代码导入 Excel 文件中的建筑进度表，解析日期，并创建 Pandas DataFrame"

数据筛选：

"筛选 category 为 'Structural' 且成本超过预算限制 50000 的建筑元素"

数据聚合：

"按楼层 level 对建筑数据进行分组，计算每层的总体积和成本"

报告生成：

"创建按 category 分组的材料数量汇总报告，导出为带格式的 Excel"

常见用例

1. 从 PDF 文档提取数据

# 给 ChatGPT 的提示词：
# "编写代码从 PDF 提取表格并转换为 DataFrame"
import pdfplumber
import pandas as pd
def pdf_to_dataframe(pdf_path):
    """从 PDF 文件提取表格"""
    all_tables = []
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)
    if all_tables:
        return pd.concat(all_tables, ignore_index=True)
    return pd.DataFrame()# 用法
df = pdf_to_dataframe("construction_spec.pdf")
df.to_excel("extracted_data.xlsx", index=False)

2. 处理 BIM 元素数据

# 提示词："分析 BIM 元素，按 category 分组，计算体积"
import pandas as pd
def analyze_bim_elements(csv_path):
    """从 CSV 导出分析 BIM 元素数据"""
    df = pd.read_csv(csv_path)
    # 按 category 分组汇总
    summary = df.groupby('Category').agg({
        'Volume': 'sum',
        'Area': 'sum',
        'ElementId': 'count'
    }).rename(columns={'ElementId': 'Count'})
    return summary# 用法
summary = analyze_bim_elements("revit_export.csv")
print(summary)

3. 成本估算流程

# 提示词："从数量和单价创建成本估算"
import pandas as pd
def calculate_cost_estimate(quantities_df, prices_df):
    """
    计算项目成本估算
    参数:
        quantities_df: 包含列 [item_code, quantity] 的 DataFrame
        prices_df: 包含列 [item_code, unit_price, unit] 的 DataFrame
    返回:
        包含成本计算的 DataFrame
    """
    # 将数量与价格合并
    result = quantities_df.merge(prices_df, on='item_code', how='left')
    # 计算成本
    result['total_cost'] = result['quantity']  result['unit_price']
    # 添加汇总
    result['cost_percentage'] = (result['total_cost'] / result['total_cost'].sum()  100).round(2)
    return result
# 用法
quantities = pd.DataFrame({
    'item_code': ['C001', 'S001', 'W001'],
    'quantity': [150, 2000, 500]
})
prices = pd.DataFrame({
    'item_code': ['C001', 'S001', 'W001'],
    'unit_price': [120, 45, 85],
    'unit': ['m3', 'kg', 'm2']
})estimate = calculate_cost_estimate(quantities, prices)
print(estimate)

4. 进度数据处理

# 提示词："解析建筑进度，计算工期，识别延误"
import pandas as pd
from datetime import datetime
def analyze_schedule(schedule_path):
    """分析建筑进度以识别延误"""
    df = pd.read_excel(schedule_path)
    # 解析日期
    df['start_date'] = pd.to_datetime(df['start_date'])
    df['end_date'] = pd.to_datetime(df['end_date'])
    df['actual_end'] = pd.to_datetime(df['actual_end'])
    # 计算工期
    df['planned_duration'] = (df['end_date'] - df['start_date']).dt.days
    df['actual_duration'] = (df['actual_end'] - df['start_date']).dt.days
    # 识别延误
    df['delay_days'] = df['actual_duration'] - df['planned_duration']
    df['is_delayed'] = df['delay_days'] > 0
    return df# 用法
schedule = analyze_schedule("project_schedule.xlsx")
delayed_tasks = schedule[schedule['is_delayed']]
print(f"延误任务数: {len(delayed_tasks)}")

本地 LLM 设置（无需互联网）

使用 Ollama

# 安装 curl -fsSL https://ollama.com/install.sh | sh # 下载模型 ollama pull mistral # 通用目的，7B 参数 ollama pull codellama # 代码导向 ollama pull deepseek-coder # 最佳编码任务

# 运行 ollama run mistral "编写 Pandas 代码，按 project_id 合并两个 DataFrame"

使用 LlamaIndex 处理公司文档

# 将公司文档加载到本地 LLM
from llama_index import SimpleDirectoryReader, VectorStoreIndex
# 从文件夹读取所有 PDF
reader = SimpleDirectoryReader("company_documents/")
documents = reader.load_data()
# 创建可搜索索引
index = VectorStoreIndex.from_documents(documents)# 查询您的文档
query_engine = index.as_query_engine()
response = query_engine.query(
    "标准混凝土配合比规格是什么？"
)
print(response)

IDE 推荐

IDE	最佳用途	特性
Jupyter Notebook	学习、实验	交互式单元、可视化
Google Colab	免费 GPU、快速启动	云端、预装库
VS Code	专业开发	扩展、GitHub Copilot
PyCharm	大型项目	高级调试、重构

Jupyter 快速设置

pip install jupyter pandas openpyxl pdfplumber
jupyter notebook

最佳实践

从简单开始：从清晰、具体的提示词开始
迭代优化：根据结果优化提示词
验证检查：运行前始终检查生成的代码
文档记录：保存可重用的有效提示词
安全保密：对敏感公司数据使用本地 LLM

常用提示词库

数据导入

"读取 Excel 文件并显示前 10 行"
"导入带自定义分隔符和编码的 CSV"
"将多个 Excel 工作表加载为 DataFrame 字典"

数据清洗

"基于 element_id 删除重复行"
"用列均值填充缺失值"
"转换列为数值类型，处理错误"

数据分析

"计算数值列的描述性统计"
"找出成本与工期的相关性"
"使用 IQR 方法识别异常值"

数据导出

"导出到多工作表的 Excel"
"保存为指定编码的 CSV"
"生成格式化的 PDF 报告"

资源

书籍：《Data-Driven Construction》，作者 Artem Boiko，第 2.3 章
网站：https://datadrivenconstruction.io
Pandas 文档：https://pandas.pydata.org/docs/
Ollama：https://ollama.com
LM Studio：https://lmstudio.ai
Google Colab：https://colab.research.google.com

下一步

参阅 pandas-construction-analysis 了解高级 Pandas 操作
参阅 pdf-to-structured 了解文档处理
参阅 etl-pipeline 了解自动化数据管道
参阅 rag-construction 了解建筑文档的 RAG 实现

# LLM Data Automation for Construction

Overview

Based on DDC methodology (Chapter 2.3), this skill enables automation of construction data processing using Large Language Models (LLM). Instead of manually coding data transformations, you describe what you need in natural language, and the LLM generates the necessary Python/Pandas code. Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"

"LLM-модели, такие как ChatGPT и LLaMA, позволяют специалистам без глубоких знаний программирования внести свой вклад в автоматизацию и улучшение бизнес-процессов компании."
— DDC Book, Chapter 2.3

Quick Start

Option 1: Use ChatGPT/Claude Online

Simply describe your data processing task in natural language: ``


Prompt: "Write Python code to read an Excel file with construction materials,
filter rows where quantity > 100, and save to CSV."

`



Option 2: Run Local LLM (Ollama)

`

bash
# Install Ollama from ollama.com
ollama pull mistral

# Run a query
ollama run mistral "Write Pandas code to calculate total cost from quantity  unit_price"

`Option 3: Use LM Studio (GUI) Download from lmstudio.ai Install and select a model (e.g., Mistral, LLaMA) Start chatting with your local AI Core Concepts DataFrame as Universal Format`python import pandas as pd # Construction project as DataFrame # Rows = elements, Columns = attributes df = pd.DataFrame({ 'element_id': ['W001', 'W002', 'C001'], 'category': ['Wall', 'Wall', 'Column'], 'material': ['Concrete', 'Brick', 'Steel'], 'volume_m3': [45.5, 32.0, 8.2], 'cost_per_m3': [150, 80, 450] }) # Calculate total cost df['total_cost'] = df['volume_m3']

 df['cost_per_m3']
print(df)

`



LLM Prompts for Construction Tasks

Data Import:

`


"Write code to import Excel file with construction schedule,
parse dates, and create a Pandas DataFrame"

`



Data Filtering:

`


"Filter construction elements where category is 'Structural'
and cost exceeds budget limit of 50000"

`



Data Aggregation:

`


"Group construction data by floor level,
calculate total volume and cost for each floor"

`



Report Generation:

`


"Create summary report with material quantities grouped by category,
export to Excel with formatting"

`



Common Use Cases

1. Extract Data from PDF Documents

`

python
# Prompt to ChatGPT:
# "Write code to extract tables from PDF and convert to DataFrame"

import pdfplumber
import pandas as pd

def pdf_to_dataframe(pdf_path):
    """Extract tables from PDF file"""
    all_tables = []
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            tables = page.extract_tables()
            for table in tables:
                if table:
                    df = pd.DataFrame(table[1:], columns=table[0])
                    all_tables.append(df)

    if all_tables:
        return pd.concat(all_tables, ignore_index=True)
    return pd.DataFrame()

# Usage
df = pdf_to_dataframe("construction_spec.pdf")
df.to_excel("extracted_data.xlsx", index=False)

`



2. Process BIM Element Data

`

python
# Prompt: "Analyze BIM elements, group by category, calculate volumes"

import pandas as pd

def analyze_bim_elements(csv_path):
    """Analyze BIM element data from CSV export"""
    df = pd.read_csv(csv_path)

    # Group by category
    summary = df.groupby('Category').agg({
        'Volume': 'sum',
        'Area': 'sum',
        'ElementId': 'count'
    }).rename(columns={'ElementId': 'Count'})

    return summary

# Usage
summary = analyze_bim_elements("revit_export.csv")
print(summary)

`



3. Cost Estimation Pipeline

`

python
# Prompt: "Create cost estimation from quantities and unit prices"

import pandas as pd

def calculate_cost_estimate(quantities_df, prices_df):
    """
    Calculate project cost estimate

    Args:
        quantities_df: DataFrame with columns [item_code, quantity]
        prices_df: DataFrame with columns [item_code, unit_price, unit]

    Returns:
        DataFrame with cost calculations
    """
    # Merge quantities with prices
    result = quantities_df.merge(prices_df, on='item_code', how='left')

    # Calculate costs
    result['total_cost'] = result['quantity']  result['unit_price']

    # Add summary
    result['cost_percentage'] = (result['total_cost'] /
                                  result['total_cost'].sum()  100).round(2)

    return result

# Usage
quantities = pd.DataFrame({
    'item_code': ['C001', 'S001', 'W001'],
    'quantity': [150, 2000, 500]
})

prices = pd.DataFrame({
    'item_code': ['C001', 'S001', 'W001'],
    'unit_price': [120, 45, 85],
    'unit': ['m3', 'kg', 'm2']
})

estimate = calculate_cost_estimate(quantities, prices)
print(estimate)

`



4. Schedule Data Processing

`

python
# Prompt: "Parse construction schedule, calculate durations, identify delays"

import pandas as pd
from datetime import datetime

def analyze_schedule(schedule_path):
    """Analyze construction schedule for delays"""
    df = pd.read_excel(schedule_path)

    # Parse dates
    df['start_date'] = pd.to_datetime(df['start_date'])
    df['end_date'] = pd.to_datetime(df['end_date'])
    df['actual_end'] = pd.to_datetime(df['actual_end'])

    # Calculate durations
    df['planned_duration'] = (df['end_date'] - df['start_date']).dt.days
    df['actual_duration'] = (df['actual_end'] - df['start_date']).dt.days

    # Identify delays
    df['delay_days'] = df['actual_duration'] - df['planned_duration']
    df['is_delayed'] = df['delay_days'] > 0

    return df

# Usage
schedule = analyze_schedule("project_schedule.xlsx")
delayed_tasks = schedule[schedule['is_delayed']]
print(f"Delayed tasks: {len(delayed_tasks)}")

`



Local LLM Setup (No Internet Required)

Using Ollama

`

bash
# Install
curl -fsSL https://ollama.com/install.sh | sh

# Download models
ollama pull mistral      # General purpose, 7B params
ollama pull codellama    # Code-focused
ollama pull deepseek-coder  # Best for coding tasks

# Run
ollama run mistral "Write Pandas code to merge two DataFrames on project_id"

`



Using LlamaIndex for Company Documents

`

python
# Load company documents into local LLM
from llama_index import SimpleDirectoryReader, VectorStoreIndex

# Read all PDFs from folder
reader = SimpleDirectoryReader("company_documents/")
documents = reader.load_data()

# Create searchable index
index = VectorStoreIndex.from_documents(documents)

# Query your documents
query_engine = index.as_query_engine()
response = query_engine.query(
    "What are the standard concrete mix specifications?"
)
print(response)

`



IDE Recommendations

| IDE | Best For | Features |
|-----|----------|----------|
| Jupyter Notebook | Learning, experiments | Interactive cells, visualizations |
| Google Colab | Free GPU, quick start | Cloud-based, pre-installed libs |
| VS Code | Professional development | Extensions, GitHub Copilot |
| PyCharm | Large projects | Advanced debugging, refactoring |

Quick Setup with Jupyter

`

bash
pip install jupyter pandas openpyxl pdfplumber
jupyter notebook

`



Best Practices

Start Simple: Begin with clear, specific prompts
Iterate: Refine prompts based on results
Validate: Always check generated code before running
Document: Save working prompts for reuse
Secure: Use local LLM for sensitive company data

Common Prompts Library

Data Import
"Read Excel file and show first 10 rows"
"Import CSV with custom delimiter and encoding"
"Load multiple Excel sheets into dictionary of DataFrames"

Data Cleaning
"Remove duplicate rows based on element_id"
"Fill missing values with column mean"
"Convert column to numeric, handling errors"

Data Analysis
"Calculate descriptive statistics for numeric columns"
"Find correlation between cost and duration"
"Identify outliers using IQR method"

Data Export
"Export to Excel with multiple sheets"
"Save to CSV with specific encoding"
"Generate formatted PDF report"

Resources

Book: "Data-Driven Construction" by Artem Boiko, Chapter 2.3
Website: https://datadrivenconstruction.io
Pandas Documentation: https://pandas.pydata.org/docs/
Ollama: https://ollama.com
LM Studio: https://lmstudio.ai
Google Colab: https://colab.research.google.com

Next Steps

See pandas-construction-analysis for advanced Pandas operations

See pdf-to-structured for document processing

See etl-pipeline for automated data pipelines

See rag-construction` for RAG implementation with construction documents

数据来源：ClawHub ↗ · 中文优化：龙虾技能库

OpenClaw 技能定制 / 插件定制 / 私有工作流定制

免费技能或插件可能存在安全风险，如需更匹配、更安全的方案，建议联系付费定制

了解定制服务

License

运行时依赖

版本

安装命令 点击复制

技能文档

概述

快速入门

选项 1：使用在线 ChatGPT/Claude

选项 2：运行本地 LLM（Ollama）

选项 3：使用 LM Studio（图形界面）

核心概念

DataFrame 作为通用格式

建筑任务的 LLM 提示词

常见用例

1. 从 PDF 文档提取数据

2. 处理 BIM 元素数据

3. 成本估算流程

4. 进度数据处理

本地 LLM 设置（无需互联网）

使用 Ollama

使用 LlamaIndex 处理公司文档

IDE 推荐

Jupyter 快速设置

最佳实践

常用提示词库

数据导入

数据清洗

数据分析

数据导出

资源

下一步

Overview

Quick Start

Option 1: Use ChatGPT/Claude Online

Option 2: Run Local LLM (Ollama)

Option 3: Use LM Studio (GUI)

Core Concepts

DataFrame as Universal Format

LLM Prompts for Construction Tasks

Common Use Cases

1. Extract Data from PDF Documents

2. Process BIM Element Data

3. Cost Estimation Pipeline

4. Schedule Data Processing

Local LLM Setup (No Internet Required)

Using Ollama

Using LlamaIndex for Company Documents

IDE Recommendations

Quick Setup with Jupyter

Best Practices

Common Prompts Library

Data Import

Data Cleaning

Data Analysis

Data Export

Resources

Next Steps

安装命令点击复制