Skip to content

Data Governance Guide

zhouning edited this page Mar 22, 2026 · 1 revision

数据治理指南

GIS Data Agent v14.5 — 基于 Google ADK 的 AI 地理空间平台 仓库:https://github.com/zhouning/gisdataagent

什么是数据治理?

GIS Data Agent 的数据治理流程对空间数据执行 六维质量审计,自动发现并修复常见问题:

维度 说明
拓扑完整性 自相交、重叠、空洞等几何拓扑错误
属性完整性 必填字段缺失率、空值比例
数据间隙 要素覆盖范围中的空白区域
坐标参考系 (CRS) CRS 一致性、投影合理性
属性取值有效性 字段值域范围检查、异常值检测
重复要素 几何形状或属性完全重复的记录

逐步治理流程

第一步:上传数据

支持的格式:Shapefile (.shp)GeoJSONGeoPackage (.gpkg)CSV(含经纬度列)

将文件直接拖入聊天面板,或通过文件上传按钮选择。ZIP 压缩包中的 .shp/.kml/.geojson/.gpkg 会自动解压。

第二步:发起治理审计

在聊天面板中输入:

  • 中文:请对这个数据进行治理审计
  • 英文:Run a governance audit on this dataset

第三步:意图路由 → 治理管道

语义意图路由器 (intent_router.py) 使用 Gemini 2.0 Flash 将请求分类为"治理"意图,并分派至 Governance PipelineSequentialAgent)。

GovExploration → GovProcessing → GovernanceReportLoop

第四步:GovExploration — 11 项自动检查

治理探索代理运行 11 项质量检查,涵盖上述六个维度。每项检查产出具体的问题列表和严重程度评分。

第五步:GovProcessing — 自动修复

治理处理代理根据检查结果执行自动修复:

  • 投影重投影:统一 CRS 至目标参考系
  • 拓扑修复:修复自相交、消除碎片多边形
  • 重复要素删除:基于几何+属性的精确去重
  • 属性标准化:空值填充、值域修正

第六步:GovernanceViz — 可视化

  • 雷达图:六维质量评分一目了然
  • 问题分布图:空间化展示数据问题的地理分布

第七步:GovernanceReportLoop — 结构化审计报告

治理报告循环代理生成结构化审计报告,最多迭代 3 轮 以确保报告质量。报告包含:问题清单、修复操作日志、治理前后对比、评分变化。

GovernanceToolset — 7 个治理工具

工具名称 功能
check_gaps 检测要素覆盖范围中的空白间隙
check_completeness 评估属性完整性(空值率、缺失字段)
check_attribute_range 检查属性值是否在合理范围内
check_duplicates 发现几何或属性重复的要素
check_crs_consistency 验证 CRS 一致性和投影合理性
governance_score 计算综合治理评分 (0-100)
governance_summary 生成治理概要报告

治理评分体系

综合评分范围 0–100,对应字母等级:

评分 等级 含义
90–100 A 优秀——数据质量极高
80–89 B 良好——存在少量问题
70–79 C 合格——需关注部分维度
60–69 D 较差——多个维度存在问题
<60 F 不合格——需大规模治理

各维度权重可配置,默认均等权重。

跨模态审计

check_consistency 工具支持 跨模态对比审计——将 PDF 报告中的数据描述与实际 Shapefile 数据进行对比,发现文档与空间数据之间的不一致。


Data Governance Guide

GIS Data Agent v14.5 — AI geospatial platform on Google ADK Repository: https://github.com/zhouning/gisdataagent

What Is Data Governance?

GIS Data Agent's data governance workflow performs a 6-dimension quality audit on spatial data, automatically discovering and remediating common issues:

Dimension Description
Topology integrity Self-intersections, overlaps, gaps, and other geometry errors
Completeness Missing required fields, null value ratios
Data gaps Blank areas within feature coverage extents
CRS consistency Coordinate Reference System uniformity and projection validity
Attribute validity Field value range checks and outlier detection
Duplicates Geometrically or attributively identical records

Step-by-Step Governance Workflow

Step 1: Upload Your Data

Supported formats: Shapefile (.shp), GeoJSON, GeoPackage (.gpkg), CSV (with lat/lng columns)

Drag files directly into the chat panel or use the upload button. ZIP archives containing .shp/.kml/.geojson/.gpkg are auto-extracted.

Step 2: Request a Governance Audit

Type in the chat panel:

  • Chinese: 请对这个数据进行治理审计
  • English: Run a governance audit on this dataset

Step 3: Intent Router → Governance Pipeline

The Semantic Intent Router (intent_router.py) uses Gemini 2.0 Flash to classify the request as a "governance" intent and dispatches it to the Governance Pipeline (SequentialAgent):

GovExploration → GovProcessing → GovernanceReportLoop

Step 4: GovExploration — 11 Automated Checks

The governance exploration agent runs 11 quality checks covering all six dimensions. Each check produces a specific list of issues and a severity score.

Step 5: GovProcessing — Auto-Remediation

The governance processing agent performs automatic fixes based on check results:

  • Reprojection: Unify CRS to target reference system
  • Topology repair: Fix self-intersections, remove sliver polygons
  • Duplicate removal: Exact deduplication based on geometry + attributes
  • Attribute standardization: Null filling, value range corrections

Step 6: GovernanceViz — Visualization

  • Radar chart: Six-dimension quality scores at a glance
  • Problem distribution map: Geographically visualize where data issues occur

Step 7: GovernanceReportLoop — Structured Audit Report

The governance report loop agent generates a structured audit report, iterating up to 3 rounds to ensure report quality. The report includes: issue inventory, remediation action log, before/after comparison, and score changes.

GovernanceToolset — 7 Governance Tools

Tool Name Function
check_gaps Detect blank gaps in feature coverage
check_completeness Assess attribute completeness (null rates, missing fields)
check_attribute_range Check whether attribute values fall within valid ranges
check_duplicates Find geometrically or attributively duplicate features
check_crs_consistency Validate CRS consistency and projection validity
governance_score Compute composite governance score (0-100)
governance_summary Generate governance summary report

Governance Scoring

Composite score ranges from 0–100 with corresponding letter grades:

Score Grade Meaning
90–100 A Excellent — very high data quality
80–89 B Good — minor issues present
70–79 C Acceptable — some dimensions need attention
60–69 D Poor — multiple dimensions have issues
<60 F Failing — major governance effort required

Dimension weights are configurable; equal weighting is the default.

Cross-Modal Audit

The check_consistency tool supports cross-modal comparison auditing — comparing data descriptions from PDF reports against actual Shapefile data to discover inconsistencies between documentation and spatial data.

Clone this wiki locally