跳到內容

Document AI

Document AI 使用視覺模型(例如 GPT-4o)從 PDF 和圖片中提取結構化資料 — 發票、表單、收據、合約或任何包含可見文字的文件。

SDK 公開兩個命名空間:

  • client.chatAi / client.chat_ai — 直接文件處理和模型列表
  • client.documentAi / client.document_ai — 基於 agent 的文件處理,包含結構定義管理、agent CRUD 和端到端編排

請先初始化客戶端(請參閱安裝快速入門)。


client.chatAi — 處理文件

列出可用模型

const models = await client.chatAi.listDocumentModels();

直接文件處理

const result = await client.chatAi.processDocument({
modelName: "gpt-4o",
url: "https://example.com/invoice.pdf",
organizationId: "org_xxx",
});

參數參考

參數TS 欄位Python 參數類型必填
模型名稱modelNamemodel_namestring / str
文件 URLurlurlstring / str
組織 IDorganizationIdorganization_idstring / str
Board IDboardIdboard_idstring / str
語言languagelanguagestring / str
額外指示additionalInstructionsadditional_instructionsstring / str
額外文件指示additionalDocumentInstructionsadditional_document_instructionsstring / str
處理模型名稱processModelNameprocess_model_namestring / str
要填寫的檔案 URLfileUrlToFillfile_url_to_fillstring / str
工具toolstoolsRecord<string, unknown>[] / List[Dict]
UTC 偏移utcutcnumber / int
區塊大小chunkSizechunk_sizenumber / int
最大並行數maxConcurrentmax_concurrentnumber / int
最大重試次數maxRetriesmax_retriesnumber / int
增強處理useEnhancedProcessinguse_enhanced_processingboolean / bool

PDF 表單填寫範例

const result = await client.chatAi.processDocument({
modelName: "gpt-4o",
url: "https://example.com/blank-invoice.pdf",
organizationId: "org_xxx",
fileUrlToFill: "https://example.com/blank-invoice.pdf",
language: "en",
});
if (result.success && result.data.filledPdfUrl) {
console.log("Filled PDF:", result.data.filledPdfUrl);
}

client.documentAi — Agent CRUD

Document AI Agent 儲存提取結構定義、指示和模型設定以供重複使用。

列出 agent

// 所有 agent
const agents = await client.documentAi.listAgents();
// 依名稱篩選
const filtered = await client.documentAi.listAgents({ nameContains: "Invoice" });
// 僅 Document AI agent(透過 createFull 或 webapp 建立)
const docAiAgents = await client.documentAi.listAgents({ documentAiOnly: true });

取得 agent

const agent = await client.documentAi.getAgent("agent_id");

建立 agent

const agent = await client.documentAi.createAgent({
name: "Invoice Extractor",
instructions: "Extract invoice fields. Dates as YYYY-MM-DD.",
model_id: "gpt-4o",
schema: {
invoice_number: { type: "string", description: "Invoice ID" },
total: { type: "number" },
date: { type: "string", format: "date" },
},
});

更新 agent

await client.documentAi.updateAgent("agent_id", {
name: "Invoice Extractor v2",
instructions: "Updated extraction logic.",
});

刪除 agent

await client.documentAi.deleteAgent("agent_id");

client.documentAi — 使用 Agent 處理

使用已設定的 agent 處理文件(從 agent 查詢模型 + 指示)。

const result = await client.documentAi.process({
agentId: "agent_id",
url: "https://example.com/invoice.pdf",
organizationId: "org_xxx",
});

您也可以覆蓋 agent 的模型或指示:

const result = await client.documentAi.process({
agentId: "agent_id",
url: "https://example.com/invoice.pdf",
organizationId: "org_xxx",
modelName: "gpt-4o", // 覆蓋 agent 的模型
instructions: "Custom prompt", // 覆蓋 agent 的指示
});

client.documentAi — 建議結構定義

透過分析範例文件自動提出 JSON 結構定義。

const schema = await client.documentAi.suggestSchema({
url: "https://example.com/invoice.pdf",
organizationId: "org_xxx",
modelName: "gpt-4o", // 可選,預設為 "gpt-4o"
});

client.documentAi — 完整建立(Orchestrator)

端到端 Document AI agent 建立(iMBRACE webapp 的做法):建立包含提取結構定義的 board,然後建立與該 board 關聯的 UseCase + AI Agent。

const result = await client.documentAi.createFull({
name: "Invoice Extractor",
instructions: "Extract invoice fields. Dates as YYYY-MM-DD.",
schemaFields: [
{ name: "invoice_number", type: "ShortText", description: "Invoice ID" },
{ name: "total", type: "Number", description: "Total amount" },
{ name: "date", type: "Date", description: "Invoice date" },
],
modelId: "gpt-4o",
providerId: "system",
});
console.log(result.board_id); // "brd_xxx"
console.log(result.ai_agent_id); // 建立的 AI Agent 的 UUID
console.log(result.usecase_id); // 建立的 UseCase 的 UUID

完整建立選項

參數TS 欄位Python 參數類型預設值
名稱namenamestring / str
指示instructionsinstructionsstring / str
結構定義欄位schemaFieldsschema_fieldsCreateBoardFieldInput[] / List[Dict]
模型 IDmodelIdmodel_idstring / str
提供者 IDproviderIdprovider_idstring / str
描述descriptiondescriptionstring / strNone
VLM 模型vlmModelvlm_modelstring / strmodelId
VLM 提供者vlmProviderIdvlm_provider_idstring / strproviderId
來源語言sourceLanguagessource_languagesstring[] / List[str]["English"]
手寫支援handwritingSupporthandwriting_supportboolean / boolfalse
時間偏移timeOffsettime_offsetstring / str"UTC+00:00"
失敗時繼續continueOnFailurecontinue_on_failureboolean / boolfalse
重試時間retryTimeretry_timenumber / int2
溫度temperaturetemperaturenumber / float0.1
示範 URLdemoUrldemo_urlstring / strNone
團隊 IDteamIdsteam_idsstring[] / List[str][]
額外 AI Agent 欄位extraAiAgentextra_ai_agentRecord<string, unknown> / DictNone

非同步使用(Python)

from imbrace import AsyncImbraceClient
async with AsyncImbraceClient() as client:
# 直接處理(chat_ai)
models = await client.chat_ai.list_document_models()
result = await client.chat_ai.process_document(
model_name="gpt-4o",
url="https://example.com/invoice.pdf",
organization_id="org_xxx",
)
# 基於 agent 的處理(document_ai)
agents = await client.document_ai.list_agents(document_ai_only=True)
result2 = await client.document_ai.process(
agent_id=agents[0]["_id"],
url="https://example.com/receipt.pdf",
organization_id="org_xxx",
)

參閱