← 返回基因目录

kb-article-normalize

Nativeknowledge.ingest

Parse WeChat-style or generic article HTML into structured fields: title, published_at, author, is_original heuristics, body_text, optional tags. No network; host fetches HTML then passes raw_html.

v0.1.02026年7月5日
有更新版本:v0.1.1 →

README

暂无文档。

基因作者可在发布时添加 README。

表现型

输入

属性类型必填描述
raw_htmlstringFull HTML of a public article page (e.g. mp.weixin.qq.com) or fragment.
fetched_atstringISO timestamp when host fetched the page (optional).
source_urlstringCanonical URL for traceability (optional).
fallback_titlestringUsed when title cannot be extracted from HTML.

输出

属性类型必填描述
tagsarray
titlestring
warningsarray
body_textstringPlain text body, whitespace normalized.
is_originalbooleanHeuristic from page markers (e.g. 原创).
published_atstringISO-8601 when parsed; empty if unknown.
author_displaystring
summary_one_linestring
原始 JSON Schema

inputSchema

{
  "type": "object",
  "required": [
    "raw_html"
  ],
  "properties": {
    "raw_html": {
      "type": "string",
      "description": "Full HTML of a public article page (e.g. mp.weixin.qq.com) or fragment."
    },
    "fetched_at": {
      "type": "string",
      "description": "ISO timestamp when host fetched the page (optional)."
    },
    "source_url": {
      "type": "string",
      "description": "Canonical URL for traceability (optional)."
    },
    "fallback_title": {
      "type": "string",
      "description": "Used when title cannot be extracted from HTML."
    }
  }
}

outputSchema

{
  "type": "object",
  "required": [
    "title",
    "published_at",
    "author_display",
    "is_original",
    "tags",
    "body_text",
    "summary_one_line",
    "warnings"
  ],
  "properties": {
    "tags": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "title": {
      "type": "string"
    },
    "warnings": {
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "body_text": {
      "type": "string",
      "description": "Plain text body, whitespace normalized."
    },
    "is_original": {
      "type": "boolean",
      "description": "Heuristic from page markers (e.g. 原创)."
    },
    "published_at": {
      "type": "string",
      "description": "ISO-8601 when parsed; empty if unknown."
    },
    "author_display": {
      "type": "string"
    },
    "summary_one_line": {
      "type": "string"
    }
  }
}