Knowledge Engine

Three-Layer Knowledge Pyramid + RAG Architecture + Research Intelligence Ingest

Three-Layer Knowledge Pyramid

MI-OS 的所有決策都必須追溯到 Ground Truth 層。三層嚴格隔離,責任不重疊。

Layer 1 — GROUND TRUTH
Immutable, decision anchor
Layer 2 — PROCESSED SIGNAL
ChromaDB, must cite source
Layer 3 — WORKING MEMORY
Reports, read-only archive
GROUND TRUTH

不可變更的事實基礎

所有投資決策必須追溯到至少一個此層的數據點。此層資料永遠不進 ChromaDB,只存在於 SQLite / GSheet / JSON。

資料來源
XBRL Financials yfinance / FRED FinMind Market Data EDGAR / MOPS Filings PredTracker Settled SC explicit_filing Edges
儲存位置
DataStore Parquet + GSheet + supply_chain.json
Financial numbers NEVER enter ChromaDB. ChromaDB = 語意理解層;SQLite/GSheet = 數值計算層。責任永不重疊。
PROCESSED SIGNAL

結構化信號 — 可用但須引用來源

所有經過 Gemini/Claude 處理的新聞信號、深度研究、技術論文。OPINION / PREDICTION 類型必須附 [source opinion] 標記。

資料來源
Worker Structured JSON Premium Newsletter Paper Abstracts (TRL-tagged) Book Frameworks SC news_mentioned Edges Research Intel
儲存位置
ChromaDB Vector Store (7 lifecycle-based collections)
OPINION / PREDICTION types 必須標註 [source opinion],禁止作為事實引用。
WORKING MEMORY

系統產出報告 — 禁止作為決策依據

系統自身生成的報告(Deep/BOS/Industry/Daily),嚴格禁止回饋進入 ChromaDB,避免自我強化迴圈。

資料來源
Daily Boss Report Deep Equity Reports BOS Credit Reports Industry Reports
儲存位置
reports/ directory (archive, read-only)
P7 Firewall: 禁止進入任何 ChromaDB collection。自動化管線永遠不得設 allow_working_memory=True。
違反後果: Error amplification via self-feedback loop (Model Collapse)

Ingestion Data Flow

URL / Text / PDF
User Input
Gemini
Structuring
Extract JSON
research_intel/
Structured JSON
• News Annotation
• Report Enrichment
• RAG ChromaDB
• Financial Model

Seven Design Principles

P1
Knowledge Three-Layer Isolation
Ground Truth / Processed Signal / Working Memory 嚴格隔離,不得混合儲存或交叉引用。
Violation → Model Collapse
P2
Financial Data Never in Vector DB
SQLite/GSheet 存數字,ChromaDB 存語意。財務數字進 ChromaDB 會導致趨勢計算崩潰。
Violation → Trend computation breaks
P3
News Lifecycle Classification
EPHEMERAL (7d) / SHORT_TERM (30d) / STRUCTURAL (∞) / TIMELESS (∞)。按生命週期分類。
Violation → Sentiment noise pollutes decisions
P4
Content Epistemics Tagging
FACT / OPINION / ANALYSIS / PREDICTION。OPINION 必須標註 [source opinion],禁止作為事實引用。
Violation → Opinions misplanted as facts
P5
Parent-Child Hierarchical Retrieval
Embedding = 段落(精準),命中後向上提取完整章節。Books/papers 需 parent_id metadata。
Violation → Derivation chain breaks
P6
TRL Time-Horizon Labeling
技術信號必須標註 TRL (1-9)。TRL < 7 自動降低 conviction。Paper→量產 2-4yr。
Violation → Lab ≠ near-term catalyst
P7
Working Memory Firewall
search_similar() 預設 allow_working_memory=False。自動化管線永遠不得設為 True。
Violation → Self-feedback loop

ChromaDB Collection Architecture

CollectionLifecycleAuto-CleanupPrimary SourcesStatus
ephemeral_signalsEPHEMERAL7 daysSentiment news, after-hoursActive
short_term_signalsSHORT_TERM30 daysEarnings events, launchesActive
structural_signalsSTRUCTURALNeverCC confirmed, regulatory, SCActive
knowledge_frameworksTIMELESSNeverBooks, deep analysisActive
tech_signalsTIMELESSNeverPapers, patents (TRL-tagged)Active
predictionsPERMANENTNeverBoss signals, PredTrackerActive
sc_eventsPERMANENTNeverSC edges (news_mentioned+)Active

Research Intelligence Ingest

輸入研究文章,系統自動結構化為 JSON 並注入新聞分析、產業報告、財務模型等下游管線。

📄
Drop file here or click to browse
Drag & drop supported
PDF • DOCX • XLSX • PPTX • TXT • MD • CSV