auto-git:
[add] README.md [add] backend/libraries/punk/library.json [add] backend/libraries/punk/stage/19f1e5d2ceaab5fd1f1dc58ff07422388f156610d16dfdea2bdb35a5b9e70813--GeorgeJordac-TheVoiceOfHumanJustice.pdf [add] backend/libraries/punk/stage/85fce554ff7685f7bccb136aff5768e54b9ba8361672fe45dbce599598c4be4b--4_Strings_-_Take_Me_Away_Into_The_Night_Vocal_Radio_Mix_.mp3 [add] backend/libraries/punk/stage/e816ca61aebd84159747d248fedd6d5ff318c471c36bcc31b1ac6bf9aebcd3c1--The_Evolution_of_Cooperation_Robert_Axelrod_liber3.pdf [add] backend/local_rag.py [add] backend/rag/__init__.py [add] backend/rag/corpus_builder.py [add] backend/rag/corpus_enricher.py [add] backend/rag/index_builder.py [add] backend/rag/unified_rag.py [add] dist/assets/index-Cc0DLWqA.css [add] dist/assets/index-DKAz6gtp.js [add] dist/index.html [add] src/LibraryManager.jsx [add] wheelcheck2117/pydantic-2.11.7-py3-none-any.whl [add] wheelcheck274/pydantic-2.7.4-py3-none-any.whl [change] backend/main.py [change] backend/requirements.txt [change] backend/schemas.py [change] electron/main.cjs [change] electron/preload.cjs [change] package.json [change] run.sh [change] src/App.jsx [change] src/InterfaceSettings.jsx [change] src/colorSchemes.js [change] src/main.jsx [change] src/styles.css
This commit is contained in:
89
README.md
Normal file
89
README.md
Normal file
@@ -0,0 +1,89 @@
|
||||
# Heimgeist
|
||||
|
||||
Heimgeist is a local desktop chat client for Ollama. It combines an Electron + React renderer with a FastAPI backend, stores chat history in SQLite, supports optional SearXNG-backed web search, and can enrich prompts with context from local library indexes.
|
||||
|
||||
## Features
|
||||
|
||||
- Local desktop chat UI with Electron
|
||||
- Ollama-backed chat with streaming and non-streaming replies
|
||||
- Persistent chat sessions and automatic title generation
|
||||
- Edit-and-regenerate flow for earlier user messages
|
||||
- Optional web search enrichment with source chips
|
||||
- Local library management for RAG-style prompt enrichment
|
||||
- Theme selection and UI scale controls
|
||||
|
||||
## Local Libraries
|
||||
|
||||
The `DBs` tab is no longer a placeholder. You can:
|
||||
|
||||
- create and rename libraries
|
||||
- register files and folders
|
||||
- build, enrich, and index library content
|
||||
- mark one library as active for chat context
|
||||
- open or remove registered files from the UI
|
||||
|
||||
When a chat library is active, Heimgeist queries it before sending a message and appends the returned context block to the prompt.
|
||||
|
||||
## Stack
|
||||
|
||||
- Frontend: Electron, React, Vite
|
||||
- Backend: FastAPI, SQLAlchemy, SQLite
|
||||
- Search enrichment: SearXNG + page fetching/reranking
|
||||
- Local RAG pipeline: corpus build, enrichment, embedding, and retrieval helpers under `backend/rag/`
|
||||
|
||||
## Development
|
||||
|
||||
Requirements:
|
||||
|
||||
- Node.js 18+
|
||||
- Python 3.13
|
||||
- Ollama running locally
|
||||
- Optional: SearXNG on `http://localhost:8888`
|
||||
|
||||
Quick start:
|
||||
|
||||
```bash
|
||||
./run.sh
|
||||
```
|
||||
|
||||
This creates or refreshes `backend/.venv`, installs Python dependencies, installs npm dependencies, and starts the dev stack.
|
||||
|
||||
Manual startup:
|
||||
|
||||
```bash
|
||||
python3.13 -m venv backend/.venv
|
||||
backend/.venv/bin/python -m pip install -r backend/requirements.txt
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
## Project Layout
|
||||
|
||||
```text
|
||||
.
|
||||
├── backend/
|
||||
│ ├── main.py
|
||||
│ ├── local_rag.py
|
||||
│ ├── rag/
|
||||
│ ├── websearch.py
|
||||
│ ├── ollama_client.py
|
||||
│ ├── models.py
|
||||
│ ├── database.py
|
||||
│ ├── schemas.py
|
||||
│ └── requirements.txt
|
||||
├── electron/
|
||||
│ ├── main.cjs
|
||||
│ └── preload.cjs
|
||||
├── src/
|
||||
│ ├── App.jsx
|
||||
│ ├── LibraryManager.jsx
|
||||
│ ├── GeneralSettings.jsx
|
||||
│ ├── InterfaceSettings.jsx
|
||||
│ ├── WebsearchSettings.jsx
|
||||
│ ├── markdown.js
|
||||
│ ├── colorSchemes.js
|
||||
│ └── styles.css
|
||||
├── package.json
|
||||
├── run.sh
|
||||
└── vite.config.js
|
||||
```
|
||||
32
backend/libraries/punk/library.json
Normal file
32
backend/libraries/punk/library.json
Normal file
@@ -0,0 +1,32 @@
|
||||
{
|
||||
"id": "f5194228933140b68625347333749baf",
|
||||
"name": "Punk",
|
||||
"slug": "punk",
|
||||
"created_at": "2026-03-19T20:02:20Z",
|
||||
"files": [
|
||||
{
|
||||
"sha256": "e816ca61aebd84159747d248fedd6d5ff318c471c36bcc31b1ac6bf9aebcd3c1",
|
||||
"path": "/Users/giers/Documents/The Evolution of Cooperation_Robert Axelrod_liber3.pdf",
|
||||
"rel": "e816ca61aebd84159747d248fedd6d5ff318c471c36bcc31b1ac6bf9aebcd3c1--The_Evolution_of_Cooperation_Robert_Axelrod_liber3.pdf",
|
||||
"name": "The Evolution of Cooperation_Robert Axelrod_liber3.pdf",
|
||||
"size": 1208035,
|
||||
"added_at": "2026-03-19T20:02:53Z"
|
||||
},
|
||||
{
|
||||
"sha256": "19f1e5d2ceaab5fd1f1dc58ff07422388f156610d16dfdea2bdb35a5b9e70813",
|
||||
"path": "/Users/giers/Documents/GeorgeJordac-TheVoiceOfHumanJustice.pdf",
|
||||
"rel": "19f1e5d2ceaab5fd1f1dc58ff07422388f156610d16dfdea2bdb35a5b9e70813--GeorgeJordac-TheVoiceOfHumanJustice.pdf",
|
||||
"name": "GeorgeJordac-TheVoiceOfHumanJustice.pdf",
|
||||
"size": 849816,
|
||||
"added_at": "2026-03-19T20:04:17Z"
|
||||
},
|
||||
{
|
||||
"sha256": "85fce554ff7685f7bccb136aff5768e54b9ba8361672fe45dbce599598c4be4b",
|
||||
"path": "/Users/giers/Music/4 Strings - Take Me Away (Into The Night) (Vocal Radio Mix).mp3",
|
||||
"rel": "85fce554ff7685f7bccb136aff5768e54b9ba8361672fe45dbce599598c4be4b--4_Strings_-_Take_Me_Away_Into_The_Night_Vocal_Radio_Mix_.mp3",
|
||||
"name": "4 Strings - Take Me Away (Into The Night) (Vocal Radio Mix).mp3",
|
||||
"size": 7994108,
|
||||
"added_at": "2026-03-19T20:06:30Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
@@ -0,0 +1 @@
|
||||
/Users/giers/Documents/GeorgeJordac-TheVoiceOfHumanJustice.pdf
|
||||
@@ -0,0 +1 @@
|
||||
/Users/giers/Music/4 Strings - Take Me Away (Into The Night) (Vocal Radio Mix).mp3
|
||||
@@ -0,0 +1 @@
|
||||
/Users/giers/Documents/The Evolution of Cooperation_Robert Axelrod_liber3.pdf
|
||||
526
backend/local_rag.py
Normal file
526
backend/local_rag.py
Normal file
@@ -0,0 +1,526 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import functools
|
||||
import hashlib
|
||||
import importlib
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import threading
|
||||
import uuid
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Dict, List, Optional
|
||||
from urllib.parse import quote
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
router = APIRouter(tags=["local-rag"])
|
||||
|
||||
LIB_ROOT = Path(__file__).parent / "libraries"
|
||||
LIB_ROOT.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
JOB_EXECUTOR = ThreadPoolExecutor(max_workers=2)
|
||||
JOBS: Dict[str, Dict[str, Any]] = {}
|
||||
LIB_LOCKS: Dict[str, asyncio.Lock] = {}
|
||||
|
||||
|
||||
class CreateLibraryRequest(BaseModel):
|
||||
name: str
|
||||
|
||||
|
||||
class RenameLibraryRequest(BaseModel):
|
||||
name: str
|
||||
|
||||
|
||||
class RegisterPathsRequest(BaseModel):
|
||||
paths: List[str]
|
||||
|
||||
|
||||
class RemoveFileRequest(BaseModel):
|
||||
rel: str
|
||||
|
||||
|
||||
class EmbedLibraryRequest(BaseModel):
|
||||
embed_model: str = "dengcao/Qwen3-Embedding-0.6B:F16"
|
||||
ollama: str = "http://localhost:11434"
|
||||
target_chars: int = 2000
|
||||
overlap_chars: int = 200
|
||||
concurrency: int = 6
|
||||
|
||||
|
||||
class LibraryContextRequest(BaseModel):
|
||||
prompt: str
|
||||
top_k: int = 5
|
||||
ollama: str = "http://localhost:11434"
|
||||
embed_model: str = "dengcao/Qwen3-Embedding-0.6B:F16"
|
||||
gen_model: str = "qwen3:4b"
|
||||
|
||||
|
||||
def now_iso() -> str:
|
||||
return datetime.utcnow().isoformat(timespec="seconds") + "Z"
|
||||
|
||||
|
||||
def slugify(name: str) -> str:
|
||||
cleaned = re.sub(r"[^a-zA-Z0-9\- ]+", "", name).strip().lower()
|
||||
cleaned = re.sub(r"\s+", "-", cleaned)
|
||||
return cleaned or f"lib-{uuid.uuid4().hex[:8]}"
|
||||
|
||||
|
||||
def lib_dir(slug: str) -> Path:
|
||||
return LIB_ROOT / slug
|
||||
|
||||
|
||||
def lib_json(slug: str) -> Path:
|
||||
return lib_dir(slug) / "library.json"
|
||||
|
||||
|
||||
def stage_dir(slug: str) -> Path:
|
||||
path = lib_dir(slug) / "stage"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
def indexes_dir(slug: str) -> Path:
|
||||
path = lib_dir(slug) / "indexes"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
def default_library_data(name: str, slug: str) -> Dict[str, Any]:
|
||||
return {
|
||||
"id": uuid.uuid4().hex,
|
||||
"name": name,
|
||||
"slug": slug,
|
||||
"created_at": now_iso(),
|
||||
"files": [],
|
||||
}
|
||||
|
||||
|
||||
def _read_json(path: Path) -> Dict[str, Any]:
|
||||
return json.loads(path.read_text(encoding="utf-8"))
|
||||
|
||||
|
||||
def read_library(slug: str) -> Dict[str, Any]:
|
||||
path = lib_json(slug)
|
||||
if not path.exists():
|
||||
raise HTTPException(status_code=404, detail="Library not found")
|
||||
return _read_json(path)
|
||||
|
||||
|
||||
def write_library(slug: str, data: Dict[str, Any]) -> None:
|
||||
path = lib_json(slug)
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
tmp = path.with_suffix(".tmp")
|
||||
tmp.write_text(json.dumps(data, indent=2), encoding="utf-8")
|
||||
tmp.replace(path)
|
||||
|
||||
|
||||
def _line_count(path: Path) -> int:
|
||||
if not path.exists():
|
||||
return 0
|
||||
with path.open("r", encoding="utf-8", errors="ignore") as handle:
|
||||
return sum(1 for line in handle if line.strip())
|
||||
|
||||
|
||||
def _file_uri(path_value: str) -> str:
|
||||
return f"file://{quote(path_value)}"
|
||||
|
||||
|
||||
def _collect_library_paths(slug: str) -> Dict[str, Path]:
|
||||
base = lib_dir(slug)
|
||||
return {
|
||||
"base": base,
|
||||
"stage": stage_dir(slug),
|
||||
"corpus": base / "corpus.jsonl",
|
||||
"enhanced": base / "corpus.enhanced.jsonl",
|
||||
"shadow": base / "corpus.shadow.jsonl",
|
||||
"indexes": indexes_dir(slug),
|
||||
"shadow_index": indexes_dir(slug) / "shadow.index.faiss",
|
||||
"shadow_store": indexes_dir(slug) / "shadow.meta.jsonl",
|
||||
"content_index": indexes_dir(slug) / "content.index.faiss",
|
||||
"content_store": indexes_dir(slug) / "content.meta.jsonl",
|
||||
}
|
||||
|
||||
|
||||
def library_payload(data: Dict[str, Any]) -> Dict[str, Any]:
|
||||
paths = _collect_library_paths(data["slug"])
|
||||
files = list(data.get("files", []))
|
||||
stages = {
|
||||
"has_files": len(files) > 0,
|
||||
"has_corpus": paths["corpus"].exists(),
|
||||
"is_enriched": paths["enhanced"].exists() and paths["shadow"].exists(),
|
||||
"is_indexed": paths["shadow_index"].exists() and paths["content_index"].exists(),
|
||||
}
|
||||
artifacts = {
|
||||
"corpus_records": _line_count(paths["corpus"]),
|
||||
"enhanced_records": _line_count(paths["enhanced"]),
|
||||
"shadow_records": _line_count(paths["shadow"]),
|
||||
}
|
||||
return {
|
||||
**data,
|
||||
"files": files,
|
||||
"states": stages,
|
||||
"artifacts": artifacts,
|
||||
}
|
||||
|
||||
|
||||
def _walk_input_paths(paths: List[str]) -> List[Path]:
|
||||
out: List[Path] = []
|
||||
for raw in paths:
|
||||
current = Path(raw).expanduser().resolve()
|
||||
if not current.exists():
|
||||
continue
|
||||
if current.is_file():
|
||||
out.append(current)
|
||||
continue
|
||||
for child in current.rglob("*"):
|
||||
if child.is_file():
|
||||
out.append(child.resolve())
|
||||
return out
|
||||
|
||||
|
||||
def _sha256_file(path: Path) -> str:
|
||||
digest = hashlib.sha256()
|
||||
with path.open("rb") as handle:
|
||||
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
|
||||
digest.update(chunk)
|
||||
return digest.hexdigest()
|
||||
|
||||
|
||||
def _stage_name(sha: str, path: Path) -> str:
|
||||
safe_name = re.sub(r"[^A-Za-z0-9._-]+", "_", path.name).strip("._") or "file"
|
||||
return f"{sha}--{safe_name}"
|
||||
|
||||
|
||||
def _job_public(job: Dict[str, Any]) -> Dict[str, Any]:
|
||||
return {
|
||||
"id": job["id"],
|
||||
"slug": job["slug"],
|
||||
"type": job["type"],
|
||||
"status": job["status"],
|
||||
"phase": job.get("phase"),
|
||||
"progress": job.get("progress", 0.0),
|
||||
"detail": job.get("detail", ""),
|
||||
"error": job.get("error"),
|
||||
"result": job.get("result"),
|
||||
"created_at": job["created_at"],
|
||||
"finished_at": job.get("finished_at"),
|
||||
}
|
||||
|
||||
|
||||
def _has_active_job(slug: str) -> bool:
|
||||
return any(
|
||||
job["slug"] == slug and job["status"] in {"queued", "running"}
|
||||
for job in JOBS.values()
|
||||
)
|
||||
|
||||
|
||||
def _load_pipeline_fn(module_name: str, attr: str):
|
||||
try:
|
||||
module = importlib.import_module(f"backend.rag.{module_name}")
|
||||
except ModuleNotFoundError:
|
||||
module = importlib.import_module(f".rag.{module_name}", package=__package__)
|
||||
return getattr(module, attr)
|
||||
|
||||
|
||||
async def _run_job(job_id: str, fn_name: str, **kwargs):
|
||||
loop = asyncio.get_running_loop()
|
||||
job = JOBS[job_id]
|
||||
|
||||
def on_progress(phase: str, pct: float, detail: str):
|
||||
job["phase"] = phase
|
||||
job["progress"] = round(float(pct) * 100.0, 1)
|
||||
job["detail"] = detail
|
||||
|
||||
job["status"] = "running"
|
||||
try:
|
||||
if fn_name == "build":
|
||||
runner = _load_pipeline_fn("corpus_builder", "run_build")
|
||||
elif fn_name == "enrich":
|
||||
runner = _load_pipeline_fn("corpus_enricher", "run_enrich")
|
||||
elif fn_name == "embed":
|
||||
runner = _load_pipeline_fn("index_builder", "run_index")
|
||||
else:
|
||||
raise RuntimeError(f"Unknown job type: {fn_name}")
|
||||
|
||||
call = functools.partial(runner, on_progress=on_progress, **kwargs)
|
||||
result = await loop.run_in_executor(JOB_EXECUTOR, call)
|
||||
job["status"] = "succeeded"
|
||||
job["progress"] = 100.0
|
||||
job["phase"] = "done"
|
||||
job["detail"] = "Completed."
|
||||
job["result"] = result
|
||||
except Exception as exc:
|
||||
job["status"] = "failed"
|
||||
job["error"] = f"{type(exc).__name__}: {exc}"
|
||||
finally:
|
||||
job["finished_at"] = now_iso()
|
||||
|
||||
|
||||
def _start_job(slug: str, job_type: str, **kwargs) -> str:
|
||||
job_id = uuid.uuid4().hex
|
||||
JOBS[job_id] = {
|
||||
"id": job_id,
|
||||
"slug": slug,
|
||||
"type": job_type,
|
||||
"status": "queued",
|
||||
"phase": "queued",
|
||||
"progress": 0.0,
|
||||
"detail": "",
|
||||
"created_at": now_iso(),
|
||||
"finished_at": None,
|
||||
"result": None,
|
||||
"error": None,
|
||||
}
|
||||
asyncio.create_task(_run_job(job_id, job_type, **kwargs))
|
||||
return job_id
|
||||
|
||||
|
||||
def _build_local_context(prompt: str, results: Dict[str, Any], top_k: int = 5) -> Dict[str, Any]:
|
||||
sources = results.get("sources") or []
|
||||
selected = sources[: max(1, top_k)]
|
||||
if not selected:
|
||||
context_block = (
|
||||
"<local_rag_context>\n"
|
||||
"No useful results were found in the selected local knowledge base.\n"
|
||||
"</local_rag_context>"
|
||||
)
|
||||
return {"context_block": context_block, "sources": []}
|
||||
|
||||
blocks: List[str] = ["<local_rag_context>"]
|
||||
file_sources: List[str] = []
|
||||
for idx, source in enumerate(selected, start=1):
|
||||
title = (source.get("title") or Path(source.get("url") or source.get("doc_id") or f"Source {idx}").name).strip()
|
||||
snippet = re.sub(r"\s+", " ", (source.get("snippet") or "")).strip()
|
||||
if len(snippet) > 1400:
|
||||
snippet = snippet[:1400].rstrip() + "..."
|
||||
raw_path = source.get("url") or source.get("doc_id") or ""
|
||||
if raw_path and os.path.isabs(raw_path):
|
||||
file_sources.append(_file_uri(raw_path))
|
||||
blocks.append(f"[L{idx}] {title}\n{snippet}")
|
||||
blocks.append("</local_rag_context>")
|
||||
blocks.append(
|
||||
"Use the local knowledge base context when it is relevant. "
|
||||
"If it does not answer the question, say so clearly instead of inventing details."
|
||||
)
|
||||
return {"context_block": "\n".join(blocks), "sources": file_sources}
|
||||
|
||||
|
||||
@router.get("/libraries")
|
||||
def list_libraries():
|
||||
libraries: List[Dict[str, Any]] = []
|
||||
for path in LIB_ROOT.iterdir():
|
||||
if not path.is_dir():
|
||||
continue
|
||||
meta = path / "library.json"
|
||||
if not meta.exists():
|
||||
continue
|
||||
try:
|
||||
libraries.append(library_payload(_read_json(meta)))
|
||||
except Exception:
|
||||
continue
|
||||
libraries.sort(key=lambda item: item.get("created_at", ""), reverse=True)
|
||||
return {"libraries": libraries}
|
||||
|
||||
|
||||
@router.post("/libraries")
|
||||
def create_library(req: CreateLibraryRequest):
|
||||
slug = slugify(req.name)
|
||||
base_slug = slug
|
||||
idx = 2
|
||||
while lib_dir(slug).exists():
|
||||
slug = f"{base_slug}-{idx}"
|
||||
idx += 1
|
||||
data = default_library_data(req.name, slug)
|
||||
stage_dir(slug)
|
||||
indexes_dir(slug)
|
||||
write_library(slug, data)
|
||||
return library_payload(data)
|
||||
|
||||
|
||||
@router.get("/libraries/{slug}")
|
||||
def get_library(slug: str):
|
||||
return library_payload(read_library(slug))
|
||||
|
||||
|
||||
@router.patch("/libraries/{slug}")
|
||||
def rename_library(slug: str, req: RenameLibraryRequest):
|
||||
data = read_library(slug)
|
||||
data["name"] = req.name.strip() or data["name"]
|
||||
write_library(slug, data)
|
||||
return library_payload(data)
|
||||
|
||||
|
||||
@router.delete("/libraries/{slug}")
|
||||
def delete_library(slug: str):
|
||||
path = lib_dir(slug)
|
||||
if not path.exists():
|
||||
raise HTTPException(status_code=404, detail="Library not found")
|
||||
shutil.rmtree(path)
|
||||
return {"ok": True}
|
||||
|
||||
|
||||
@router.post("/libraries/{slug}/files/register")
|
||||
def register_paths(slug: str, req: RegisterPathsRequest):
|
||||
data = read_library(slug)
|
||||
stage = stage_dir(slug)
|
||||
existing = {entry.get("sha256"): entry for entry in data.get("files", [])}
|
||||
added: List[Dict[str, Any]] = []
|
||||
skipped: List[str] = []
|
||||
|
||||
for file_path in _walk_input_paths(req.paths):
|
||||
sha = _sha256_file(file_path)
|
||||
if sha in existing:
|
||||
skipped.append(str(file_path))
|
||||
continue
|
||||
stage_name = _stage_name(sha, file_path)
|
||||
symlink_path = stage / stage_name
|
||||
if symlink_path.exists():
|
||||
symlink_path.unlink()
|
||||
symlink_path.symlink_to(file_path)
|
||||
entry = {
|
||||
"sha256": sha,
|
||||
"path": str(file_path),
|
||||
"rel": stage_name,
|
||||
"name": file_path.name,
|
||||
"size": file_path.stat().st_size,
|
||||
"added_at": now_iso(),
|
||||
}
|
||||
data.setdefault("files", []).append(entry)
|
||||
added.append(entry)
|
||||
existing[sha] = entry
|
||||
|
||||
write_library(slug, data)
|
||||
return {
|
||||
"added": added,
|
||||
"skipped": skipped,
|
||||
"library": library_payload(data),
|
||||
}
|
||||
|
||||
|
||||
@router.delete("/libraries/{slug}/files")
|
||||
def remove_file(slug: str, req: RemoveFileRequest):
|
||||
data = read_library(slug)
|
||||
before = len(data.get("files", []))
|
||||
data["files"] = [entry for entry in data.get("files", []) if entry.get("rel") != req.rel]
|
||||
symlink_path = stage_dir(slug) / req.rel
|
||||
if symlink_path.exists():
|
||||
symlink_path.unlink()
|
||||
write_library(slug, data)
|
||||
if len(data["files"]) == before:
|
||||
raise HTTPException(status_code=404, detail="File not found")
|
||||
return {"ok": True, "library": library_payload(data)}
|
||||
|
||||
|
||||
@router.post("/libraries/{slug}/jobs/build")
|
||||
async def build_library(slug: str):
|
||||
data = read_library(slug)
|
||||
if not data.get("files"):
|
||||
raise HTTPException(status_code=400, detail="Add files before building a library.")
|
||||
lock = LIB_LOCKS.setdefault(slug, asyncio.Lock())
|
||||
async with lock:
|
||||
if _has_active_job(slug):
|
||||
raise HTTPException(status_code=409, detail="This library already has an active job.")
|
||||
job_id = _start_job(
|
||||
slug,
|
||||
"build",
|
||||
root=stage_dir(slug),
|
||||
out=_collect_library_paths(slug)["corpus"],
|
||||
)
|
||||
return {"job_id": job_id}
|
||||
|
||||
|
||||
@router.post("/libraries/{slug}/jobs/enrich")
|
||||
async def enrich_library(slug: str):
|
||||
paths = _collect_library_paths(slug)
|
||||
if not paths["corpus"].exists():
|
||||
raise HTTPException(status_code=400, detail="Build the corpus before enrichment.")
|
||||
lock = LIB_LOCKS.setdefault(slug, asyncio.Lock())
|
||||
async with lock:
|
||||
if _has_active_job(slug):
|
||||
raise HTTPException(status_code=409, detail="This library already has an active job.")
|
||||
job_id = _start_job(
|
||||
slug,
|
||||
"enrich",
|
||||
inp=paths["corpus"],
|
||||
out=paths["enhanced"],
|
||||
shadow_out=paths["shadow"],
|
||||
)
|
||||
return {"job_id": job_id}
|
||||
|
||||
|
||||
@router.post("/libraries/{slug}/jobs/embed")
|
||||
async def embed_library(slug: str, req: EmbedLibraryRequest):
|
||||
paths = _collect_library_paths(slug)
|
||||
if not paths["corpus"].exists():
|
||||
raise HTTPException(status_code=400, detail="Build the corpus before indexing.")
|
||||
lock = LIB_LOCKS.setdefault(slug, asyncio.Lock())
|
||||
async with lock:
|
||||
if _has_active_job(slug):
|
||||
raise HTTPException(status_code=409, detail="This library already has an active job.")
|
||||
job_id = _start_job(
|
||||
slug,
|
||||
"embed",
|
||||
raw=paths["corpus"],
|
||||
enhanced=paths["enhanced"] if paths["enhanced"].exists() else None,
|
||||
shadow=paths["shadow"] if paths["shadow"].exists() else None,
|
||||
out_dir=paths["indexes"],
|
||||
embed_model=req.embed_model,
|
||||
ollama=req.ollama,
|
||||
target_chars=req.target_chars,
|
||||
overlap_chars=req.overlap_chars,
|
||||
concurrency=req.concurrency,
|
||||
)
|
||||
return {"job_id": job_id}
|
||||
|
||||
|
||||
@router.get("/jobs")
|
||||
def list_jobs(slug: Optional[str] = None):
|
||||
jobs = [_job_public(job) for job in JOBS.values() if slug is None or job["slug"] == slug]
|
||||
jobs.sort(key=lambda item: item.get("created_at", ""), reverse=True)
|
||||
return {"jobs": jobs}
|
||||
|
||||
|
||||
@router.get("/jobs/{job_id}")
|
||||
def get_job(job_id: str):
|
||||
job = JOBS.get(job_id)
|
||||
if not job:
|
||||
raise HTTPException(status_code=404, detail="Job not found")
|
||||
return _job_public(job)
|
||||
|
||||
|
||||
@router.post("/libraries/{slug}/context")
|
||||
def library_context(slug: str, req: LibraryContextRequest):
|
||||
paths = _collect_library_paths(slug)
|
||||
if not paths["shadow_index"].exists() or not paths["content_index"].exists():
|
||||
raise HTTPException(status_code=400, detail="Index the library before using it in chat.")
|
||||
try:
|
||||
run_query = _load_pipeline_fn("unified_rag", "run_query")
|
||||
result = run_query(
|
||||
shadow_index=paths["shadow_index"],
|
||||
shadow_store=paths["shadow_store"],
|
||||
content_index=paths["content_index"],
|
||||
content_store=paths["content_store"],
|
||||
query=req.prompt,
|
||||
answer=False,
|
||||
ollama=req.ollama,
|
||||
embed_model=req.embed_model,
|
||||
gen_model=req.gen_model,
|
||||
no_rerank=True,
|
||||
k=max(1, req.top_k),
|
||||
)
|
||||
except Exception as exc:
|
||||
raise HTTPException(status_code=500, detail=f"Local retrieval failed: {type(exc).__name__}: {exc}") from exc
|
||||
|
||||
context = _build_local_context(req.prompt, result, top_k=req.top_k)
|
||||
return {
|
||||
"context_block": context["context_block"],
|
||||
"sources": context["sources"],
|
||||
"result": result,
|
||||
}
|
||||
@@ -8,6 +8,7 @@ import html
|
||||
import json
|
||||
from . import models, schemas
|
||||
from .database import Base, engine, SessionLocal, ensure_sources_column
|
||||
from .local_rag import router as local_rag_router
|
||||
from .ollama_client import list_models as ollama_list, chat as ollama_chat, chat_stream as ollama_chat_stream
|
||||
from .websearch import enrich_prompt
|
||||
|
||||
@@ -25,6 +26,7 @@ app.add_middleware(
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
app.include_router(local_rag_router)
|
||||
|
||||
def get_db():
|
||||
db = SessionLocal()
|
||||
@@ -331,8 +333,11 @@ async def websearch_route(req: schemas.WebSearchRequest):
|
||||
searx_url=req.searx_url,
|
||||
engines=req.engines,
|
||||
)
|
||||
return {"enriched_prompt": enriched, "sources": sources}
|
||||
context_block = ""
|
||||
if "<websearch_context>" in enriched:
|
||||
context_block = enriched[enriched.index("<websearch_context>"):].strip()
|
||||
return {"enriched_prompt": enriched, "sources": sources, "context_block": context_block}
|
||||
except Exception:
|
||||
return {"enriched_prompt": req.prompt, "sources": []}
|
||||
return {"enriched_prompt": req.prompt, "sources": [], "context_block": ""}
|
||||
|
||||
# To run standalone: python -m uvicorn backend.main:app --host 127.0.0.1 --port 8000
|
||||
|
||||
0
backend/rag/__init__.py
Normal file
0
backend/rag/__init__.py
Normal file
1741
backend/rag/corpus_builder.py
Normal file
1741
backend/rag/corpus_builder.py
Normal file
File diff suppressed because it is too large
Load Diff
1048
backend/rag/corpus_enricher.py
Normal file
1048
backend/rag/corpus_enricher.py
Normal file
File diff suppressed because it is too large
Load Diff
525
backend/rag/index_builder.py
Normal file
525
backend/rag/index_builder.py
Normal file
@@ -0,0 +1,525 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
03_index_builder.py
|
||||
|
||||
Flexible FAISS index builder for hybrid RAG.
|
||||
|
||||
Supports these inputs (any subset):
|
||||
- --raw : corpus.jsonl from 01_corpus_builder.py (no enrichment)
|
||||
- --enhanced : corpus.enhanced.jsonl from 02_corpus_enricher.py
|
||||
- --shadow : corpus.shadow.jsonl from 02_corpus_enricher.py
|
||||
|
||||
Outputs (by default into ./indexes):
|
||||
- shadow.index.faiss : FAISS IP index over vectors of "shadow_text"
|
||||
- shadow.meta.jsonl : metadata for each FAISS id (id, doc_id, record_id, title, url, record_type, mime, lang, kind, shadow_text)
|
||||
- content.index.faiss : FAISS IP index over vectors of chunked "text"
|
||||
- content.meta.jsonl : metadata for each FAISS id (id, doc_id, record_id, chunk_no, title, url, text, record_type, mime, lang)
|
||||
|
||||
Behavior
|
||||
- If you provide --shadow → build shadow from it.
|
||||
- Else if you provide --enhanced → synthesize shadow from enriched fields (headline+summary+keywords+entities+qa).
|
||||
- Else if you provide --raw → synthesize shadow from raw (title + first sentences + hints).
|
||||
- If you provide --enhanced → build content from it.
|
||||
- Else if you provide --raw → build content from raw text (chunking).
|
||||
- You can disable either side with --no-shadow or --no-content.
|
||||
|
||||
Embedding
|
||||
- Uses Ollama /api/embeddings with cosine similarity (L2-normalize then IP).
|
||||
|
||||
Examples:
|
||||
|
||||
# Full hybrid from enriched+shadow
|
||||
python 03_index_builder.py \
|
||||
--enhanced corpus.enhanced.jsonl \
|
||||
--shadow corpus.shadow.jsonl \
|
||||
--out-dir indexes \
|
||||
--embed-model "dengcao/Qwen3-Embedding-0.6B:F16" \
|
||||
--target-chars 2500 --overlap-chars 200 \
|
||||
--concurrency 6
|
||||
|
||||
# Raw-only (no enricher) → builds content from raw text and a proxy shadow
|
||||
python 03_index_builder.py \
|
||||
--raw corpus.jsonl \
|
||||
--out-dir indexes \
|
||||
--embed-model "dengcao/Qwen3-Embedding-0.6B:F16"
|
||||
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse, json, sys, uuid, os, re
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Iterable, List, Tuple, Optional, Callable
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
import threading
|
||||
import numpy as np
|
||||
import requests
|
||||
import faiss
|
||||
from tqdm import tqdm
|
||||
|
||||
# -----------------------------
|
||||
# IO
|
||||
# -----------------------------
|
||||
def read_jsonl(path: Path) -> Iterable[Dict[str, Any]]:
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
for line in f:
|
||||
if line.strip():
|
||||
try:
|
||||
yield json.loads(line)
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
def ensure_dir(p: Path):
|
||||
p.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# -----------------------------
|
||||
# Text helpers
|
||||
# -----------------------------
|
||||
def pick_text(rec: Dict[str, Any]) -> str:
|
||||
return rec.get("text") or rec.get("content") or rec.get("body") or ""
|
||||
|
||||
def first_sentences(s: str, max_chars: int = 500) -> str:
|
||||
s = (s or "").strip()
|
||||
if not s:
|
||||
return ""
|
||||
# cheap sentence-ish split
|
||||
parts = re.split(r"(?<=[\.\!\?])\s+", s)
|
||||
out = []
|
||||
total = 0
|
||||
for p in parts:
|
||||
if not p:
|
||||
continue
|
||||
out.append(p)
|
||||
total += len(p) + 1
|
||||
if total >= max_chars:
|
||||
break
|
||||
joined = " ".join(out).strip()
|
||||
return joined[:max_chars].rstrip()
|
||||
|
||||
def chunk_text(txt: str, target_chars: int = 2500, overlap_chars: int = 200) -> Iterable[str]:
|
||||
# paragraph-first greedy pack
|
||||
paras = [p.strip() for p in (txt or "").split("\n\n") if p.strip()]
|
||||
if not paras:
|
||||
if txt.strip():
|
||||
yield txt.strip()
|
||||
return
|
||||
buf, size = [], 0
|
||||
for p in paras:
|
||||
if size + len(p) + 2 > target_chars and buf:
|
||||
chunk = "\n\n".join(buf)
|
||||
yield chunk
|
||||
if overlap_chars > 0 and len(chunk) > overlap_chars:
|
||||
tail = chunk[-overlap_chars:]
|
||||
buf, size = [tail], len(tail)
|
||||
else:
|
||||
buf, size = [], 0
|
||||
buf.append(p)
|
||||
size += len(p) + 2
|
||||
if buf:
|
||||
yield "\n\n".join(buf)
|
||||
|
||||
def norm_f32(mat: np.ndarray) -> np.ndarray:
|
||||
mat = np.asarray(mat, dtype="float32")
|
||||
norms = np.linalg.norm(mat, axis=1, keepdims=True)
|
||||
norms[norms == 0] = 1.0
|
||||
return mat / norms
|
||||
|
||||
# -----------------------------
|
||||
# Embedding
|
||||
# -----------------------------
|
||||
def embed_many(ollama_url: str, model: str, texts: List[str], *, concurrency: int = 4, timeout: int = 120, on_progress=None) -> List[np.ndarray]:
|
||||
def _embed_one(t: str) -> np.ndarray:
|
||||
r = requests.post(f"{ollama_url.rstrip('/')}/api/embeddings", json={"model": model, "prompt": t}, timeout=timeout)
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
vec = data.get("embedding") or (data.get("embeddings") or [None])[0]
|
||||
if vec is None:
|
||||
raise RuntimeError("No 'embedding' in response")
|
||||
return np.array(vec, dtype="float32")
|
||||
|
||||
out: List[Optional[np.ndarray]] = [None] * len(texts)
|
||||
with ThreadPoolExecutor(max_workers=max(1, concurrency)) as ex:
|
||||
futures = {ex.submit(_embed_one, t): i for i, t in enumerate(texts)}
|
||||
|
||||
progress_bar = None
|
||||
if on_progress is None and 'tqdm' in globals() and tqdm is not None:
|
||||
progress_bar = tqdm(as_completed(futures), total=len(futures), desc="embed")
|
||||
|
||||
iterator = progress_bar if progress_bar else as_completed(futures)
|
||||
|
||||
count = 0
|
||||
for fut in iterator:
|
||||
i = futures[fut]
|
||||
out[i] = fut.result()
|
||||
count += 1
|
||||
if on_progress:
|
||||
on_progress("embed", count / len(texts), f"Embedding {count}/{len(texts)}")
|
||||
|
||||
# type: ignore
|
||||
return out # List[np.ndarray]
|
||||
|
||||
# -----------------------------
|
||||
# Meta helpers
|
||||
# -----------------------------
|
||||
def derive_doc_id_from_any(any_id: Optional[str], parent_id: Optional[str]) -> str:
|
||||
"""Prefer parent_id if present (file-level), else base of 'id' before '#...'."""
|
||||
if parent_id:
|
||||
return str(parent_id)
|
||||
if not any_id:
|
||||
return ""
|
||||
return any_id.split("#", 1)[0]
|
||||
|
||||
def kind_from_rec(rec: Dict[str, Any]) -> str:
|
||||
rt = (rec.get("record_type") or "").lower()
|
||||
mime = (rec.get("mime") or "").lower()
|
||||
if rt == "image" or (mime.startswith("image/")):
|
||||
return "image"
|
||||
if rt == "av" or mime.startswith(("audio/", "video/")):
|
||||
return "av"
|
||||
if "html" in mime or rt in {"html-section"}:
|
||||
return "html"
|
||||
if "pdf" in mime or rt == "page":
|
||||
return "pdf"
|
||||
if rt == "code-summary" or mime.startswith("text/x-code"):
|
||||
return "code"
|
||||
return rt or "file"
|
||||
|
||||
# -----------------------------
|
||||
# Shadow text synthesis (fallbacks)
|
||||
# -----------------------------
|
||||
def synth_shadow_from_enhanced(rec: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Build a compact shadow_text from enriched fields if present.
|
||||
"""
|
||||
parts: List[str] = []
|
||||
h = (rec.get("headline") or rec.get("title") or "").strip()
|
||||
s = (rec.get("summary") or "").strip()
|
||||
kws = rec.get("keywords") or []
|
||||
ents = rec.get("entities") or []
|
||||
qas = rec.get("qa") or []
|
||||
|
||||
if h:
|
||||
parts.append(f"headline: {h}")
|
||||
if s:
|
||||
parts.append(f"summary: {s}")
|
||||
if kws:
|
||||
parts.append("keywords: " + ", ".join([str(k).strip() for k in kws if str(k).strip()]))
|
||||
if ents:
|
||||
uniq = {}
|
||||
for e in ents:
|
||||
if not isinstance(e, dict):
|
||||
continue
|
||||
name = (e.get("name") or "").strip()
|
||||
typ = (e.get("type") or "OTHER").strip().upper()
|
||||
if name and name.lower() not in uniq:
|
||||
uniq[name.lower()] = (name, typ)
|
||||
if uniq:
|
||||
parts.append("entities: " + "; ".join(f"{n} [{t}]" for n, t in uniq.values()))
|
||||
if qas:
|
||||
qa_lines = []
|
||||
for qa in qas[:4]:
|
||||
if not isinstance(qa, dict):
|
||||
continue
|
||||
q = (qa.get("q") or "").strip()
|
||||
a = (qa.get("a") or "").strip()
|
||||
if q and a:
|
||||
qa_lines.append(f"Q: {q}\nA: {a}")
|
||||
if qa_lines:
|
||||
parts.append("qa:\n" + "\n".join(qa_lines))
|
||||
return "\n".join(parts).strip()
|
||||
|
||||
def synth_shadow_from_raw(rec: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Build a proxy shadow_text without any LLM: title + first sentences + light hints.
|
||||
"""
|
||||
title = (rec.get("title") or "").strip()
|
||||
text = pick_text(rec)
|
||||
kind = kind_from_rec(rec)
|
||||
url = rec.get("url") or rec.get("source_path") or ""
|
||||
head = f"headline: {title}" if title else ""
|
||||
summary = first_sentences(text, 500)
|
||||
parts = []
|
||||
if head:
|
||||
parts.append(head)
|
||||
if summary:
|
||||
parts.append(f"summary: {summary}")
|
||||
hints = []
|
||||
if kind:
|
||||
hints.append(kind)
|
||||
if rec.get("mime"):
|
||||
hints.append(rec.get("mime").split(";")[0])
|
||||
if url:
|
||||
hints.append(Path(url).name)
|
||||
if hints:
|
||||
parts.append("keywords: " + ", ".join(hints))
|
||||
return "\n".join(parts).strip()
|
||||
|
||||
# -----------------------------
|
||||
# Builders
|
||||
# -----------------------------
|
||||
def build_shadow_any(
|
||||
shadow_jsonl: Optional[Path],
|
||||
enhanced_jsonl: Optional[Path],
|
||||
raw_jsonl: Optional[Path],
|
||||
out_index: Path,
|
||||
out_meta: Path,
|
||||
*,
|
||||
ollama: str,
|
||||
model: str,
|
||||
concurrency: int
|
||||
) -> Tuple[int, int, int]:
|
||||
"""
|
||||
Build FAISS over shadow_text from best available source.
|
||||
Priority: shadow_jsonl > enhanced_jsonl (synth) > raw_jsonl (synth).
|
||||
Returns (n_input_records, n_indexed, dim)
|
||||
"""
|
||||
src_records: List[Dict[str, Any]] = []
|
||||
mode = ""
|
||||
if shadow_jsonl and shadow_jsonl.exists():
|
||||
src_records = list(read_jsonl(shadow_jsonl))
|
||||
mode = "shadow"
|
||||
elif enhanced_jsonl and enhanced_jsonl.exists():
|
||||
src_records = list(read_jsonl(enhanced_jsonl))
|
||||
mode = "enhanced->shadow"
|
||||
elif raw_jsonl and raw_jsonl.exists():
|
||||
src_records = list(read_jsonl(raw_jsonl))
|
||||
mode = "raw->shadow"
|
||||
else:
|
||||
raise SystemExit("[ERR] No input for shadow index (need --shadow OR --enhanced OR --raw).")
|
||||
|
||||
if not src_records:
|
||||
raise SystemExit("[ERR] Empty input for shadow index.")
|
||||
|
||||
texts: List[str] = []
|
||||
metas: List[Dict[str, Any]] = []
|
||||
for rec in src_records:
|
||||
if mode == "shadow":
|
||||
st = rec.get("shadow_text") or ""
|
||||
elif mode == "enhanced->shadow":
|
||||
st = synth_shadow_from_enhanced(rec)
|
||||
else:
|
||||
st = synth_shadow_from_raw(rec)
|
||||
|
||||
if not st.strip():
|
||||
continue
|
||||
|
||||
record_id = rec.get("id") or rec.get("record_id") or str(uuid.uuid4())
|
||||
doc_id = derive_doc_id_from_any(record_id, rec.get("parent_id"))
|
||||
|
||||
meta = {
|
||||
"id": None, # numeric FAISS id later
|
||||
"record_id": record_id,
|
||||
"doc_id": doc_id,
|
||||
"title": rec.get("title"),
|
||||
"url": rec.get("url") or rec.get("source_path"),
|
||||
"record_type": rec.get("record_type"),
|
||||
"mime": rec.get("mime"),
|
||||
"lang": rec.get("lang"),
|
||||
"kind": kind_from_rec(rec),
|
||||
"shadow_text": st,
|
||||
}
|
||||
metas.append(meta)
|
||||
texts.append(st)
|
||||
|
||||
if not texts:
|
||||
raise SystemExit("[ERR] no shadow_text to embed")
|
||||
|
||||
vecs = embed_many(ollama, model, texts, concurrency=concurrency)
|
||||
d = len(vecs[0])
|
||||
mat = norm_f32(np.vstack(vecs))
|
||||
|
||||
base = faiss.IndexFlatIP(d)
|
||||
index = faiss.IndexIDMap2(base)
|
||||
|
||||
out_meta.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(out_meta, "w", encoding="utf-8") as mf:
|
||||
buf_vecs, buf_ids = [], []
|
||||
next_id = 0
|
||||
for m, v in zip(metas, mat):
|
||||
m["id"] = next_id
|
||||
mf.write(json.dumps(m, ensure_ascii=False) + "\n")
|
||||
buf_vecs.append(v)
|
||||
buf_ids.append(next_id)
|
||||
next_id += 1
|
||||
if len(buf_vecs) >= 512:
|
||||
index.add_with_ids(np.vstack(buf_vecs), np.array(buf_ids, dtype="int64"))
|
||||
buf_vecs, buf_ids = [], []
|
||||
if buf_vecs:
|
||||
index.add_with_ids(np.vstack(buf_vecs), np.array(buf_ids, dtype="int64"))
|
||||
|
||||
faiss.write_index(index, str(out_index))
|
||||
return (len(src_records), index.ntotal, d)
|
||||
|
||||
def build_content_any(
|
||||
enhanced_jsonl: Optional[Path],
|
||||
raw_jsonl: Optional[Path],
|
||||
out_index: Path,
|
||||
out_meta: Path,
|
||||
*,
|
||||
ollama: str,
|
||||
model: str,
|
||||
target_chars: int,
|
||||
overlap_chars: int,
|
||||
concurrency: int
|
||||
) -> Tuple[int, int, int]:
|
||||
"""
|
||||
Build FAISS over chunked 'text' from best available source.
|
||||
Priority: enhanced_jsonl > raw_jsonl.
|
||||
Returns (n_input_records, n_chunks, dim)
|
||||
"""
|
||||
src_records: List[Dict[str, Any]] = []
|
||||
mode = ""
|
||||
if enhanced_jsonl and enhanced_jsonl.exists():
|
||||
src_records = list(read_jsonl(enhanced_jsonl))
|
||||
mode = "enhanced"
|
||||
elif raw_jsonl and raw_jsonl.exists():
|
||||
src_records = list(read_jsonl(raw_jsonl))
|
||||
mode = "raw"
|
||||
else:
|
||||
raise SystemExit("[ERR] No input for content index (need --enhanced OR --raw).")
|
||||
|
||||
metas: List[Dict[str, Any]] = []
|
||||
texts: List[str] = []
|
||||
for rec in src_records:
|
||||
base_text = pick_text(rec)
|
||||
if not base_text.strip():
|
||||
continue
|
||||
record_id = rec.get("id") or rec.get("record_id") or str(uuid.uuid4())
|
||||
doc_id = derive_doc_id_from_any(record_id, rec.get("parent_id"))
|
||||
title = rec.get("title")
|
||||
url = rec.get("url") or rec.get("source_path")
|
||||
|
||||
chunks = list(chunk_text(base_text, target_chars, overlap_chars))
|
||||
if not chunks:
|
||||
continue
|
||||
for ci, chunk in enumerate(chunks):
|
||||
meta = {
|
||||
"id": None, # numeric FAISS id later
|
||||
"doc_id": doc_id,
|
||||
"record_id": record_id,
|
||||
"chunk_no": ci,
|
||||
"title": title,
|
||||
"url": url,
|
||||
"text": chunk,
|
||||
"record_type": rec.get("record_type"),
|
||||
"mime": rec.get("mime"),
|
||||
"lang": rec.get("lang"),
|
||||
}
|
||||
metas.append(meta)
|
||||
texts.append(chunk)
|
||||
|
||||
if not texts:
|
||||
raise SystemExit("[ERR] no content chunks to embed")
|
||||
|
||||
vecs = embed_many(ollama, model, texts, concurrency=concurrency)
|
||||
d = len(vecs[0])
|
||||
mat = norm_f32(np.vstack(vecs))
|
||||
|
||||
base = faiss.IndexFlatIP(d)
|
||||
index = faiss.IndexIDMap2(base)
|
||||
|
||||
out_meta.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(out_meta, "w", encoding="utf-8") as mf:
|
||||
buf_vecs, buf_ids = [], []
|
||||
next_id = 0
|
||||
for m, v in zip(metas, mat):
|
||||
m["id"] = next_id
|
||||
mf.write(json.dumps(m, ensure_ascii=False) + "\n")
|
||||
buf_vecs.append(v)
|
||||
buf_ids.append(next_id)
|
||||
next_id += 1
|
||||
if len(buf_vecs) >= 512:
|
||||
index.add_with_ids(np.vstack(buf_vecs), np.array(buf_ids, dtype="int64"))
|
||||
buf_vecs, buf_ids = [], []
|
||||
if buf_vecs:
|
||||
index.add_with_ids(np.vstack(buf_vecs), np.array(buf_ids, dtype="int64"))
|
||||
|
||||
faiss.write_index(index, str(out_index))
|
||||
return (len(src_records), index.ntotal, d)
|
||||
|
||||
# -----------------------------
|
||||
# CLI
|
||||
# -----------------------------
|
||||
def run_index(raw: Path|None, enhanced: Path|None, shadow: Path|None, out_dir: Path, *,
|
||||
on_progress=None, **opts) -> dict:
|
||||
|
||||
args = argparse.Namespace(
|
||||
raw=raw,
|
||||
enhanced=enhanced,
|
||||
shadow=shadow,
|
||||
out_dir=out_dir,
|
||||
embed_model=opts.get("embed_model", "dengcao/Qwen3-Embedding-0.6B:F16"),
|
||||
ollama=opts.get("ollama", "http://localhost:11434"),
|
||||
target_chars=opts.get("target_chars", 2500),
|
||||
overlap_chars=opts.get("overlap_chars", 200),
|
||||
concurrency=opts.get("concurrency", 6),
|
||||
no_shadow=opts.get("no_shadow", False),
|
||||
no_content=opts.get("no_content", False),
|
||||
)
|
||||
|
||||
ensure_dir(out_dir)
|
||||
|
||||
shadow_index_path = out_dir / "shadow.index.faiss"
|
||||
shadow_meta_path = out_dir / "shadow.meta.jsonl"
|
||||
content_index_path = out_dir / "content.index.faiss"
|
||||
content_meta_path = out_dir / "content.meta.jsonl"
|
||||
|
||||
results = {}
|
||||
built_any = False
|
||||
|
||||
if not args.no_shadow:
|
||||
if on_progress: on_progress("shadow", 0.1, "Building shadow index...")
|
||||
s_tot, s_ix, s_dim = build_shadow_any(
|
||||
args.shadow, args.enhanced, args.raw,
|
||||
shadow_index_path, shadow_meta_path,
|
||||
ollama=args.ollama, model=args.embed_model, concurrency=args.concurrency
|
||||
)
|
||||
results["shadow"] = {"records": s_tot, "indexed": s_ix, "dim": s_dim}
|
||||
if on_progress: on_progress("shadow", 0.5, "Shadow index complete.")
|
||||
built_any = True
|
||||
|
||||
if not args.no_content:
|
||||
if on_progress: on_progress("content", 0.6, "Building content index...")
|
||||
c_tot, c_ix, c_dim = build_content_any(
|
||||
args.enhanced, args.raw,
|
||||
content_index_path, content_meta_path,
|
||||
ollama=args.ollama, model=args.embed_model,
|
||||
target_chars=args.target_chars, overlap_chars=args.overlap_chars,
|
||||
concurrency=args.concurrency
|
||||
)
|
||||
results["content"] = {"records": c_tot, "chunks": c_ix, "dim": c_dim}
|
||||
if on_progress: on_progress("content", 0.9, "Content index complete.")
|
||||
built_any = True
|
||||
|
||||
if not built_any:
|
||||
return {"status": "warning", "message": "Nothing built."}
|
||||
|
||||
if on_progress: on_progress("done", 1.0, "Indexing complete.")
|
||||
return {"status": "ok", "results": results}
|
||||
|
||||
def main():
|
||||
ap = argparse.ArgumentParser(description="Build FAISS indexes (shadow + content) for hybrid RAG with or without enrichment.")
|
||||
ap.add_argument("--raw", help="Raw corpus JSONL (from 01_corpus_builder.py)")
|
||||
ap.add_argument("--enhanced", help="Enhanced corpus JSONL (from 02_corpus_enricher.py)")
|
||||
ap.add_argument("--shadow", help="Shadow corpus JSONL (from 02_corpus_enricher.py)")
|
||||
ap.add_argument("--out-dir", default="indexes", help="Output directory for indexes + metadata")
|
||||
ap.add_argument("--embed-model", default="dengcao/Qwen3-Embedding-0.6B:F16", help="Ollama embedding model")
|
||||
ap.add_argument("--ollama", default="http://localhost:11434", help="Ollama base URL")
|
||||
ap.add_argument("--target-chars", type=int, default=2500, help="Chunk size for content index")
|
||||
ap.add_argument("--overlap-chars", type=int, default=200, help="Overlap size for content index")
|
||||
ap.add_argument("--concurrency", type=int, default=6, help="Parallel HTTP workers for embeddings")
|
||||
ap.add_argument("--no-shadow", action="store_true", help="Do not build shadow index")
|
||||
ap.add_argument("--no-content", action="store_true", help="Do not build content index")
|
||||
args = ap.parse_args()
|
||||
|
||||
run_index(
|
||||
Path(args.raw) if args.raw else None,
|
||||
Path(args.enhanced) if args.enhanced else None,
|
||||
Path(args.shadow) if args.shadow else None,
|
||||
Path(args.out_dir),
|
||||
on_progress=lambda p, pct, d: print(f"[{p}] {pct*100:.1f}%: {d}"),
|
||||
**vars(args)
|
||||
)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
687
backend/rag/unified_rag.py
Normal file
687
backend/rag/unified_rag.py
Normal file
@@ -0,0 +1,687 @@
|
||||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
04_unified_rag.py
|
||||
|
||||
Hybrid retrieval + (optional) rerank + (optional) answer generation.
|
||||
|
||||
Now supports:
|
||||
- HYBRID: shadow+content indexes (best quality)
|
||||
- SINGLE-INDEX:
|
||||
* legacy pair (--index/--store) ← back-compat
|
||||
* content-only pair (--content-index/--content-store)
|
||||
* shadow-only pair (--shadow-index/--shadow-store)
|
||||
|
||||
If you skipped enrichment:
|
||||
- Build only content + proxy shadow with 03_index_builder.py (raw → content; raw/enhanced→proxy shadow)
|
||||
- Query with:
|
||||
* HYBRID: provide both pairs
|
||||
* SINGLE-INDEX: provide only one pair (content OR shadow)
|
||||
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse, json, os, sys, subprocess, math
|
||||
from pathlib import Path
|
||||
from typing import List, Dict, Tuple, Optional
|
||||
|
||||
import faiss
|
||||
import numpy as np
|
||||
import requests
|
||||
import threading
|
||||
from typing import Callable
|
||||
|
||||
# -----------------------------
|
||||
# Utilities
|
||||
# -----------------------------
|
||||
def norm_f32(mat: np.ndarray) -> np.ndarray:
|
||||
mat = np.asarray(mat, dtype="float32")
|
||||
norms = np.linalg.norm(mat, axis=1, keepdims=True)
|
||||
norms[norms == 0] = 1.0
|
||||
return mat / norms
|
||||
|
||||
def zscore(x: List[float]) -> List[float]:
|
||||
if not x:
|
||||
return []
|
||||
mu = float(np.mean(x))
|
||||
sd = float(np.std(x))
|
||||
if sd == 0.0:
|
||||
return [0.0 for _ in x]
|
||||
return [(v - mu) / sd for v in x]
|
||||
|
||||
def sanitize(s: Optional[str]) -> str:
|
||||
if not s:
|
||||
return ""
|
||||
import re
|
||||
s = re.sub(r"<\s*think\s*>.*?<\s*/\s*think\s*>", "", s, flags=re.S|re.I)
|
||||
s = re.sub(r"^\s*```(?:\w+)?\s*|\s*```\s*$", "", s, flags=re.M)
|
||||
s = re.sub(r"[ \t]+", " ", s)
|
||||
s = re.sub(r"\n{3,}", "\n\n", s)
|
||||
return s.strip()
|
||||
|
||||
def pick_any_text(rec: Dict) -> str:
|
||||
"""Use 'text' if present else 'shadow_text' for rerank/answer/pretty."""
|
||||
return rec.get("text") or rec.get("shadow_text") or rec.get("content") or rec.get("body") or ""
|
||||
|
||||
def embed_query(ollama_url: str, model: str, text: str, timeout_s: int = 60) -> np.ndarray:
|
||||
r = requests.post(
|
||||
f"{ollama_url.rstrip('/')}/api/embeddings",
|
||||
json={"model": model, "prompt": text},
|
||||
timeout=timeout_s,
|
||||
)
|
||||
r.raise_for_status()
|
||||
data = r.json()
|
||||
vec = data.get("embedding") or (data.get("embeddings") or [None])[0]
|
||||
if vec is None:
|
||||
raise RuntimeError("Ollama /api/embeddings returned no vector.")
|
||||
return np.array(vec, dtype="float32")
|
||||
|
||||
def load_meta(store_path: str) -> Dict[int, Dict]:
|
||||
id2meta: Dict[int, Dict] = {}
|
||||
with open(store_path, "r", encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
rec = json.loads(line)
|
||||
id2meta[int(rec["id"])] = rec
|
||||
return id2meta
|
||||
|
||||
def truncate_text(s: Optional[str], limit: int) -> str:
|
||||
if not s:
|
||||
return ""
|
||||
return s if len(s) <= limit else s[:limit]
|
||||
|
||||
def derive_doc_id(rec: Dict) -> str:
|
||||
# Prefer explicit doc_id if provided by meta builder
|
||||
did = rec.get("doc_id")
|
||||
if did:
|
||||
return did
|
||||
rid = rec.get("record_id") or rec.get("id") or ""
|
||||
return rid.split("#", 1)[0]
|
||||
|
||||
# -----------------------------
|
||||
# Rerank (subprocess worker)
|
||||
# -----------------------------
|
||||
def sentence_transformers_available() -> bool:
|
||||
try:
|
||||
import importlib.util as _ilu
|
||||
spec = _ilu.find_spec("sentence_transformers")
|
||||
return spec is not None
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def rerank_subprocess(
|
||||
query: str,
|
||||
docs: List[str],
|
||||
*,
|
||||
worker_path: Path,
|
||||
model: str,
|
||||
device: str,
|
||||
dtype: str,
|
||||
batch: int,
|
||||
maxlen: int,
|
||||
) -> Optional[List[Tuple[int, float]]]:
|
||||
"""
|
||||
Call this same script with --mode rerank-worker via a clean Python subprocess.
|
||||
Returns: list of (local_index, score) sorted desc, or None on failure.
|
||||
"""
|
||||
payload = {"query": query, "docs": docs}
|
||||
cmd = [
|
||||
sys.executable,
|
||||
str(worker_path),
|
||||
"--mode", "rerank-worker",
|
||||
"--rerank-model", model,
|
||||
"--rerank-device", device,
|
||||
"--rerank-dtype", dtype,
|
||||
"--rerank-batch", str(batch),
|
||||
"--rerank-maxlen", str(maxlen),
|
||||
"--stdio"
|
||||
]
|
||||
env = os.environ.copy()
|
||||
env.setdefault("PYTORCH_ENABLE_MPS_FALLBACK", "1")
|
||||
env.setdefault("TOKENIZERS_PARALLELISM", "false")
|
||||
env.setdefault("KMP_DUPLICATE_LIB_OK", "TRUE")
|
||||
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
input=json.dumps(payload).encode("utf-8"),
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
check=False,
|
||||
env=env,
|
||||
)
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"[rerank] failed to launch worker: {e}\n")
|
||||
return None
|
||||
|
||||
if proc.returncode != 0:
|
||||
sys.stderr.write(proc.stderr.decode("utf-8", errors="ignore") + "\n")
|
||||
return None
|
||||
|
||||
try:
|
||||
data = json.loads(proc.stdout.decode("utf-8"))
|
||||
results = data.get("results") or []
|
||||
pairs = [(int(r["index"]), float(r["score"])) for r in results]
|
||||
pairs.sort(key=lambda x: x[1], reverse=True)
|
||||
return pairs
|
||||
except Exception as e:
|
||||
sys.stderr.write(f"[rerank] parse error: {e}\n")
|
||||
return None
|
||||
|
||||
# -----------------------------
|
||||
# Simple diversity (per-source cap)
|
||||
# -----------------------------
|
||||
def apply_per_source_cap(ordered: List[Dict], per_source_limit: int) -> List[Dict]:
|
||||
if per_source_limit <= 0:
|
||||
return ordered
|
||||
counts = {}
|
||||
out = []
|
||||
for rec in ordered:
|
||||
key = rec.get("url") or rec.get("doc_id") or rec.get("title") or str(rec.get("id"))
|
||||
c = counts.get(key, 0)
|
||||
if c < per_source_limit:
|
||||
out.append(rec)
|
||||
counts[key] = c + 1
|
||||
return out
|
||||
|
||||
# -----------------------------
|
||||
# Generation
|
||||
# -----------------------------
|
||||
def generate(
|
||||
ollama_url: str,
|
||||
model: str,
|
||||
prompt: str,
|
||||
system: Optional[str] = None,
|
||||
temperature: float = 0.2,
|
||||
timeout_s: int = 180,
|
||||
on_stream=None,
|
||||
) -> str:
|
||||
payload = {"model": model, "prompt": prompt, "options": {"temperature": temperature}}
|
||||
if system:
|
||||
payload["system"] = system
|
||||
r = requests.post(f"{ollama_url.rstrip('/')}/api/generate", json=payload, timeout=timeout_s, stream=True)
|
||||
r.raise_for_status()
|
||||
out = []
|
||||
for chunk in r.iter_lines(decode_unicode=True):
|
||||
if not chunk:
|
||||
continue
|
||||
try:
|
||||
obj = json.loads(chunk)
|
||||
delta = obj.get("response", "")
|
||||
out.append(delta)
|
||||
if on_stream:
|
||||
on_stream({"delta": delta})
|
||||
if obj.get("done"):
|
||||
break
|
||||
except Exception:
|
||||
pass
|
||||
return sanitize("".join(out))
|
||||
|
||||
# -----------------------------
|
||||
# Search helpers
|
||||
# -----------------------------
|
||||
def faiss_search(index: faiss.Index, qvec: np.ndarray, k: int) -> Tuple[List[int], List[float]]:
|
||||
sims, ids = index.search(qvec, k)
|
||||
ids = [int(i) for i in ids[0] if i != -1]
|
||||
sims = [float(s) for s in sims[0][: len(ids)]]
|
||||
return ids, sims
|
||||
|
||||
# -----------------------------
|
||||
# Output / answer
|
||||
# -----------------------------
|
||||
def output_or_answer(final: List[Dict], args, on_stream=None):
|
||||
if not args.answer:
|
||||
# Return top-k results without generating an answer
|
||||
return {
|
||||
"done": True,
|
||||
"sources": [
|
||||
{
|
||||
"doc_id": rec.get("doc_id"),
|
||||
"title": rec.get("title"),
|
||||
"url": rec.get("url"),
|
||||
"record_type": rec.get("record_type"),
|
||||
"mime": rec.get("mime"),
|
||||
"lang": rec.get("lang"),
|
||||
"snippet": pick_any_text(rec),
|
||||
"scores": {
|
||||
"final": float(rec.get("_score", 0.0)),
|
||||
"shadow": float(rec.get("_shadow")) if rec.get("_shadow") is not None else None,
|
||||
"content": float(rec.get("_ann", 0.0)),
|
||||
"rerank": float(rec.get("_rerank")) if rec.get("_rerank") is not None else None,
|
||||
},
|
||||
}
|
||||
for rec in final
|
||||
],
|
||||
}
|
||||
|
||||
# Build prompt for answering
|
||||
context_blocks, sources = [], []
|
||||
for i, rec in enumerate(final, start=1):
|
||||
text = pick_any_text(rec)
|
||||
title = rec.get("title") or "(untitled)"
|
||||
url = rec.get("url") or title
|
||||
sources.append(f"[{i}] {url}")
|
||||
context_blocks.append(f"[{i}] {title}\n{text}")
|
||||
|
||||
system = (
|
||||
"You are a careful researcher. Answer ONLY from the provided sources. "
|
||||
"Cite like [1], [2] in-line. If the answer is not in the sources, say you can't find it. "
|
||||
"Do not include private chain-of-thought or <think> tags."
|
||||
)
|
||||
prompt = (
|
||||
f"Question: {args.query}\n\n"
|
||||
"Use the sources below. If not answerable from them, say so clearly.\n\n"
|
||||
"Sources:\n" + "\n\n".join(context_blocks) + "\n\n----\n\n"
|
||||
"Remember: only use these sources. Provide a concise answer with citations.\n\n"
|
||||
f"And again. The question you need to answer is: {args.query}"
|
||||
)
|
||||
|
||||
full_answer = generate(args.ollama, args.gen_model, prompt, system=system, temperature=args.temperature, on_stream=on_stream)
|
||||
|
||||
final_result = {
|
||||
"done": True,
|
||||
"answer": full_answer,
|
||||
"sources": [
|
||||
{
|
||||
"doc_id": rec.get("doc_id"),
|
||||
"title": rec.get("title"),
|
||||
"url": rec.get("url"),
|
||||
}
|
||||
for rec in final
|
||||
],
|
||||
}
|
||||
if on_stream:
|
||||
on_stream(final_result)
|
||||
|
||||
return final_result
|
||||
|
||||
# -----------------------------
|
||||
# Main CLI (search / answer)
|
||||
# -----------------------------
|
||||
def run_cli(args):
|
||||
# Determine mode
|
||||
hybrid_ok = all([args.shadow_index, args.shadow_store, args.content_index, args.content_store])
|
||||
|
||||
single_pair: Optional[Tuple[str, str]] = None
|
||||
single_kind = None
|
||||
if not hybrid_ok:
|
||||
# Prefer legacy if provided
|
||||
if args.index and args.store:
|
||||
single_pair = (args.index, args.store)
|
||||
single_kind = "legacy"
|
||||
elif args.content_index and args.content_store:
|
||||
single_pair = (args.content_index, args.content_store)
|
||||
single_kind = "content"
|
||||
elif args.shadow_index and args.shadow_store:
|
||||
single_pair = (args.shadow_index, args.shadow_store)
|
||||
single_kind = "shadow"
|
||||
|
||||
# Embed query
|
||||
q = norm_f32(embed_query(args.ollama, args.embed_model, args.query).reshape(1, -1))
|
||||
|
||||
if single_pair:
|
||||
# SINGLE-INDEX path (works for legacy/content-only/shadow-only)
|
||||
index = faiss.read_index(single_pair[0])
|
||||
id2meta = load_meta(single_pair[1])
|
||||
|
||||
ids, sims = faiss_search(index, q, min(args.candidates, index.ntotal))
|
||||
candidates = []
|
||||
for pos, _id in enumerate(ids):
|
||||
base = id2meta[_id]
|
||||
rec = dict(base)
|
||||
rec["_ann"] = sims[pos]
|
||||
candidates.append(rec)
|
||||
|
||||
# Optional rerank
|
||||
reranked_scores = None
|
||||
if not args.no_rerank and sentence_transformers_available():
|
||||
docs = [truncate_text(pick_any_text(c), args.max_doc_chars) for c in candidates]
|
||||
pairs = rerank_subprocess(
|
||||
args.query, docs,
|
||||
worker_path=Path(__file__),
|
||||
model=args.rerank_model,
|
||||
device=args.rerank_device,
|
||||
dtype=args.rerank_dtype,
|
||||
batch=args.rerank_batch,
|
||||
maxlen=args.rerank_maxlen,
|
||||
)
|
||||
if pairs is not None:
|
||||
reranked_scores = [None] * len(candidates)
|
||||
for local_idx, score in pairs:
|
||||
if 0 <= local_idx < len(reranked_scores):
|
||||
reranked_scores[local_idx] = float(score)
|
||||
min_score = min([s for s in reranked_scores if s is not None], default=0.0)
|
||||
reranked_scores = [s if s is not None else min_score for s in reranked_scores]
|
||||
else:
|
||||
print("[info] rerank disabled (worker failed).", file=sys.stderr)
|
||||
|
||||
# Blend
|
||||
if reranked_scores is not None:
|
||||
z_ann = zscore([c["_ann"] for c in candidates])
|
||||
z_rr = zscore(reranked_scores)
|
||||
alpha = float(args.blend)
|
||||
final_scores = [(1 - alpha) * a + alpha * r for a, r in zip(z_ann, z_rr)]
|
||||
for rec, fs, rr in zip(candidates, final_scores, reranked_scores):
|
||||
rec["_score"] = float(fs)
|
||||
rec["_rerank"] = float(rr)
|
||||
candidates.sort(key=lambda r: r["_score"], reverse=True)
|
||||
else:
|
||||
for rec in candidates:
|
||||
rec["_score"] = rec["_ann"]
|
||||
|
||||
final = candidates[: max(1, min(args.k, len(candidates)))]
|
||||
return output_or_answer(final, args)
|
||||
|
||||
# HYBRID path
|
||||
shadow_index = faiss.read_index(args.shadow_index)
|
||||
shadow_meta = load_meta(args.shadow_store)
|
||||
content_index = faiss.read_index(args.content_index)
|
||||
content_meta = load_meta(args.content_store)
|
||||
|
||||
# Stage A: Shadow search → doc shortlist
|
||||
sid_list, s_sim = faiss_search(shadow_index, q, min(args.shadow_candidates, shadow_index.ntotal))
|
||||
s_hits = [{"id": sid, "sim": sim, **shadow_meta[sid]} for sid, sim in zip(sid_list, s_sim)]
|
||||
|
||||
# optional shadow weighting by kind
|
||||
kw = {}
|
||||
for kv in args.shadow_kind_weights.split(","):
|
||||
kv = kv.strip()
|
||||
if not kv:
|
||||
continue
|
||||
if ":" in kv:
|
||||
k, v = kv.split(":", 1)
|
||||
try:
|
||||
kw[k.strip().lower()] = float(v)
|
||||
except Exception:
|
||||
pass
|
||||
if kw:
|
||||
for h in s_hits:
|
||||
w = kw.get((h.get("kind") or "").lower(), 1.0)
|
||||
h["sim"] *= float(w)
|
||||
|
||||
# group to doc_id
|
||||
doc_scores: Dict[str, float] = {}
|
||||
for h in s_hits:
|
||||
did = derive_doc_id(h)
|
||||
doc_scores[did] = max(doc_scores.get(did, 0.0), float(h["sim"])) # max over shadow signals
|
||||
|
||||
# Stage B: Content search (global)
|
||||
cid_list, c_sim = faiss_search(content_index, q, min(args.content_candidates, content_index.ntotal))
|
||||
c_hits = [{"id": cid, "sim": sim, **content_meta[cid]} for cid, sim in zip(cid_list, c_sim)]
|
||||
|
||||
# Stage C: filter to doc shortlist
|
||||
ordered_docs = sorted(doc_scores.items(), key=lambda kv: kv[1], reverse=True)[: args.doc_top]
|
||||
if not ordered_docs:
|
||||
# Fallback: derive docs from top content hits
|
||||
tmp_docs = []
|
||||
seen = set()
|
||||
for h in c_hits:
|
||||
did = derive_doc_id(h)
|
||||
if did not in seen:
|
||||
seen.add(did)
|
||||
tmp_docs.append((did, float(h['sim'])))
|
||||
if len(tmp_docs) >= args.doc_top:
|
||||
break
|
||||
ordered_docs = tmp_docs
|
||||
shortlist = set([d for d, _ in ordered_docs])
|
||||
|
||||
# keep content hits belonging to shortlist (fallback to global if empty)
|
||||
content_for_docs = [h for h in c_hits if derive_doc_id(h) in shortlist] or c_hits
|
||||
|
||||
# per-doc cap
|
||||
per_doc = max(1, args.per_doc_chunks)
|
||||
doc_buckets: Dict[str, List[Dict]] = {}
|
||||
for h in content_for_docs:
|
||||
did = derive_doc_id(h)
|
||||
doc_buckets.setdefault(did, []).append(h)
|
||||
for did, arr in doc_buckets.items():
|
||||
arr.sort(key=lambda r: r["sim"], reverse=True)
|
||||
doc_buckets[did] = arr[:per_doc]
|
||||
|
||||
# flatten, compute final score as blend of shadow(doc) + content(chunk)
|
||||
out_candidates: List[Dict] = []
|
||||
for did, doc_sim in ordered_docs:
|
||||
for ch in doc_buckets.get(did, []):
|
||||
final = dict(ch)
|
||||
final["_shadow"] = float(doc_sim)
|
||||
final["_ann"] = float(ch["sim"])
|
||||
alpha = float(args.doc_blend) # weight of shadow
|
||||
beta = float(args.chunk_blend) # weight of chunk ann
|
||||
final["_score"] = alpha * float(doc_sim) + beta * float(ch["sim"])
|
||||
out_candidates.append(final)
|
||||
|
||||
if not out_candidates:
|
||||
print("No retrieval results.", file=sys.stderr)
|
||||
if args.answer:
|
||||
print("No results from retrieval; cannot answer.")
|
||||
return
|
||||
|
||||
out_candidates.sort(key=lambda r: r["_score"], reverse=True)
|
||||
|
||||
# Optional rerank of the first pool
|
||||
reranked_scores = None
|
||||
if not args.no_rerank and sentence_transformers_available():
|
||||
pool = out_candidates[: args.candidates]
|
||||
docs = [truncate_text(pick_any_text(c), args.max_doc_chars) for c in pool]
|
||||
pairs = rerank_subprocess(
|
||||
args.query, docs,
|
||||
worker_path=Path(__file__),
|
||||
model=args.rerank_model,
|
||||
device=args.rerank_device,
|
||||
dtype=args.rerank_dtype,
|
||||
batch=args.rerank_batch,
|
||||
maxlen=args.rerank_maxlen,
|
||||
)
|
||||
if pairs is not None:
|
||||
reranked_scores = [None] * len(pool)
|
||||
for local_idx, score in pairs:
|
||||
if 0 <= local_idx < len(pool):
|
||||
reranked_scores[local_idx] = float(score)
|
||||
min_score = min([s for s in reranked_scores if s is not None], default=0.0)
|
||||
reranked_scores = [s if s is not None else min_score for s in reranked_scores]
|
||||
z_ann = zscore([c["_score"] for c in pool])
|
||||
z_rr = zscore(reranked_scores)
|
||||
alpha = float(args.blend)
|
||||
blended = [(1 - alpha) * a + alpha * r for a, r in zip(z_ann, z_rr)]
|
||||
for rec, fs, rr in zip(pool, blended, reranked_scores):
|
||||
rec["_score"] = float(fs)
|
||||
rec["_rerank"] = float(rr)
|
||||
out_candidates[: len(pool)] = sorted(pool, key=lambda r: r["_score"], reverse=True)
|
||||
else:
|
||||
print("[info] rerank disabled (worker failed).", file=sys.stderr)
|
||||
|
||||
# per-source cap and top-k
|
||||
ordered = apply_per_source_cap(out_candidates, args.per_source_limit)
|
||||
final = ordered[: max(1, min(args.k, len(ordered)))]
|
||||
return output_or_answer(final, args)
|
||||
|
||||
# -----------------------------
|
||||
# Rerank worker mode
|
||||
# -----------------------------
|
||||
def run_rerank_worker(args):
|
||||
"""
|
||||
Reads JSON from stdin: {"query": str, "docs": [str, ...]}
|
||||
Writes JSON to stdout: {"results": [{"index": int, "score": float}, ...]}
|
||||
"""
|
||||
try:
|
||||
import torch
|
||||
from sentence_transformers import CrossEncoder
|
||||
except Exception as e:
|
||||
# Gracefully tell parent we failed by returning empty results
|
||||
out = {"results": []}
|
||||
json.dump(out, sys.stdout)
|
||||
sys.stdout.flush()
|
||||
print(f"[worker] sentence_transformers unavailable: {e}", file=sys.stderr)
|
||||
return
|
||||
|
||||
try:
|
||||
torch.set_num_threads(1)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
device = args.rerank_device
|
||||
if device == "auto":
|
||||
device = "mps" if torch.backends.mps.is_available() else "cpu"
|
||||
|
||||
if args.rerank_dtype == "auto":
|
||||
dtype = torch.float16 if device == "mps" else torch.float32
|
||||
else:
|
||||
dtype = torch.float16 if args.rerank_dtype == "float16" else torch.float32
|
||||
|
||||
model = CrossEncoder(
|
||||
args.rerank_model,
|
||||
device=device,
|
||||
max_length=args.rerank_maxlen,
|
||||
automodel_args={"torch_dtype": dtype},
|
||||
)
|
||||
|
||||
data = json.load(sys.stdin)
|
||||
query = data["query"]
|
||||
docs = data["docs"]
|
||||
|
||||
pairs = [(query, d) for d in docs]
|
||||
scores = model.predict(
|
||||
pairs,
|
||||
batch_size=args.rerank_batch,
|
||||
convert_to_numpy=True,
|
||||
show_progress_bar=False,
|
||||
).tolist()
|
||||
|
||||
ordered = sorted(enumerate(scores), key=lambda x: x[1], reverse=True)
|
||||
out = {"results": [{"index": int(i), "score": float(s)} for i, s in ordered]}
|
||||
json.dump(out, sys.stdout)
|
||||
sys.stdout.flush()
|
||||
|
||||
# -----------------------------
|
||||
# Argparse
|
||||
# -----------------------------
|
||||
def build_parser():
|
||||
ap = argparse.ArgumentParser(allow_abbrev=False)
|
||||
ap.add_argument("--mode", default="cli", choices=["cli", "rerank-worker"])
|
||||
|
||||
# Legacy single-index I/O (kept for back-compat)
|
||||
ap.add_argument("--index", help="Single FAISS index (legacy)")
|
||||
ap.add_argument("--store", help="Single metadata JSONL (legacy)")
|
||||
ap.add_argument("--candidates", type=int, default=200, help="ANN neighbors to fetch (legacy or rerank pool).")
|
||||
|
||||
# Hybrid I/O
|
||||
ap.add_argument("--shadow-index", help="FAISS index over shadow_text")
|
||||
ap.add_argument("--shadow-store", help="Metadata JSONL for shadow index")
|
||||
ap.add_argument("--content-index", help="FAISS index over content chunks")
|
||||
ap.add_argument("--content-store", help="Metadata JSONL for content index")
|
||||
|
||||
ap.add_argument("--query", required=False)
|
||||
ap.add_argument("--ollama", default="http://localhost:11434")
|
||||
ap.add_argument("--embed-model", default="dengcao/Qwen3-Embedding-0.6B:F16")
|
||||
|
||||
# Shadow/content retrieval sizes (hybrid)
|
||||
ap.add_argument("--shadow-candidates", type=int, default=400, help="Shadow ANN pool size")
|
||||
ap.add_argument("--content-candidates", type=int, default=600, help="Content ANN pool size")
|
||||
ap.add_argument("--doc-top", type=int, default=40, help="Top-N documents from shadow shortlist")
|
||||
ap.add_argument("--per-doc-chunks", type=int, default=2, help="Max chunks per doc from content pool")
|
||||
ap.add_argument("--doc-blend", type=float, default=0.6, help="Weight for shadow score in final blend [0..1]")
|
||||
ap.add_argument("--chunk-blend", type=float, default=0.4, help="Weight for content-ANN score in final blend [0..1]")
|
||||
ap.add_argument("--shadow-kind-weights", default="image:1.2,code:1.1", help="Comma list 'kind:weight' to bias doc ranking")
|
||||
|
||||
# Rerank knobs
|
||||
ap.add_argument("--no-rerank", action="store_true", help="Disable reranking.")
|
||||
ap.add_argument("--blend", type=float, default=0.75, help="Blend weight for reranker in normalized score [0..1].")
|
||||
ap.add_argument("--rerank-model", default="cross-encoder/ms-marco-MiniLM-L-6-v2")
|
||||
ap.add_argument("--rerank-device", default="auto", choices=["auto", "mps", "cpu"])
|
||||
ap.add_argument("--rerank-dtype", default="auto", choices=["auto", "float16", "float32"])
|
||||
ap.add_argument("--rerank-batch", type=int, default=64)
|
||||
ap.add_argument("--rerank-maxlen", type=int, default=256)
|
||||
ap.add_argument("--stdio", action="store_true", help=argparse.SUPPRESS) # worker-only flag
|
||||
|
||||
# Output / answer
|
||||
ap.add_argument("--json", action="store_true", help="Print search results as JSON.")
|
||||
ap.add_argument("--pretty", action="store_true", help="Pretty-print search results.")
|
||||
ap.add_argument("--show-scores", action="store_true", help="Show ANN/rerank scores in pretty output.")
|
||||
ap.add_argument("--answer", action="store_true", help="Generate an answer with an LLM using top-k contexts.")
|
||||
ap.add_argument("--gen-model", default="qwen3:4b",
|
||||
help="Any chat-capable model in Ollama (e.g., 'qwen2.5:7b-instruct', 'llama3.1:8b-instruct').")
|
||||
ap.add_argument("--temperature", type=float, default=0.2)
|
||||
ap.add_argument("--k", type=int, default=10, help="Number of final results to return/use.")
|
||||
|
||||
# Misc
|
||||
ap.add_argument("--max-doc-chars", type=int, default=4000, help="Truncate each candidate before rerank.")
|
||||
ap.add_argument("--per-source-limit", type=int, default=3, help="Max results per source (url/doc) to diversify.")
|
||||
|
||||
return ap
|
||||
|
||||
def run_query(shadow_index: Path, shadow_store: Path,
|
||||
content_index: Path, content_store: Path,
|
||||
query: str, *, answer: bool = False,
|
||||
on_stream: Optional[Callable[[Dict], None]] = None, **opts) -> dict:
|
||||
|
||||
# Ensure all paths are strings for argparse.Namespace
|
||||
_shadow_index = str(shadow_index) if shadow_index else None
|
||||
_shadow_store = str(shadow_store) if shadow_store else None
|
||||
_content_index = str(content_index) if content_index else None
|
||||
_content_store = str(content_store) if content_store else None
|
||||
|
||||
args = argparse.Namespace(
|
||||
shadow_index=_shadow_index,
|
||||
shadow_store=_shadow_store,
|
||||
content_index=_content_index,
|
||||
content_store=_content_store,
|
||||
query=query,
|
||||
answer=answer,
|
||||
ollama=opts.get("ollama", "http://localhost:11434"),
|
||||
embed_model=opts.get("embed_model", "dengcao/Qwen3-Embedding-0.6B:F16"),
|
||||
shadow_candidates=opts.get("shadow_candidates", 400),
|
||||
content_candidates=opts.get("content_candidates", 600),
|
||||
doc_top=opts.get("doc_top", 40),
|
||||
per_doc_chunks=opts.get("per_doc_chunks", 2),
|
||||
doc_blend=opts.get("doc_blend", 0.6),
|
||||
chunk_blend=opts.get("chunk_blend", 0.4),
|
||||
shadow_kind_weights=opts.get("shadow_kind_weights", "image:1.2,code:1.1"),
|
||||
no_rerank=opts.get("no_rerank", True), # Reranker OFF by default
|
||||
blend=opts.get("blend", 0.75),
|
||||
rerank_model=opts.get("rerank_model", "cross-encoder/ms-marco-MiniLM-L-6-v2"),
|
||||
rerank_device=opts.get("rerank_device", "auto"),
|
||||
rerank_dtype=opts.get("rerank_dtype", "auto"),
|
||||
rerank_batch=opts.get("rerank_batch", 64),
|
||||
rerank_maxlen=opts.get("rerank_maxlen", 256),
|
||||
gen_model=opts.get("gen_model", "qwen3:4b"),
|
||||
temperature=opts.get("temperature", 0.2),
|
||||
k=opts.get("k", 10),
|
||||
max_doc_chars=opts.get("max_doc_chars", 4000),
|
||||
per_source_limit=opts.get("per_source_limit", 3),
|
||||
json=True # Force JSON-like output dict
|
||||
)
|
||||
|
||||
return run_cli(args, on_stream=on_stream)
|
||||
|
||||
def main():
|
||||
ap = build_parser()
|
||||
args = ap.parse_args()
|
||||
|
||||
if args.mode == "rerank-worker":
|
||||
return run_rerank_worker(args)
|
||||
|
||||
if not args.query:
|
||||
ap.error("--query is required in cli mode")
|
||||
|
||||
hybrid_ok = all([args.shadow_index, args.shadow_store, args.content_index, args.content_store])
|
||||
if not hybrid_ok:
|
||||
ap.error("For CLI use, all four index/store paths are required for hybrid retrieval.")
|
||||
|
||||
result = run_query(
|
||||
shadow_index=Path(args.shadow_index),
|
||||
shadow_store=Path(args.shadow_store),
|
||||
content_index=Path(args.content_index),
|
||||
content_store=Path(args.content_store),
|
||||
query=args.query,
|
||||
answer=args.answer,
|
||||
on_stream=lambda d: print(json.dumps(d, ensure_ascii=False), flush=True) if args.answer else None,
|
||||
**vars(args)
|
||||
)
|
||||
|
||||
if not args.answer:
|
||||
print(json.dumps(result, ensure_ascii=False))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -3,9 +3,24 @@ fastapi==0.111.0
|
||||
uvicorn[standard]==0.30.1
|
||||
SQLAlchemy==2.0.32
|
||||
httpx==0.27.0
|
||||
pydantic==2.7.4
|
||||
pydantic==2.11.7
|
||||
requests>=2.32.0
|
||||
|
||||
# Web search enrichment dependencies
|
||||
beautifulsoup4==4.12.3
|
||||
httpx[http2]>=0.27.0
|
||||
numpy
|
||||
|
||||
# Local RAG pipeline dependencies
|
||||
faiss-cpu>=1.8.0
|
||||
PyMuPDF>=1.24.0
|
||||
ebooklib>=0.18
|
||||
chardet>=5.2.0
|
||||
Pillow>=10.0.0
|
||||
langid>=1.1.6
|
||||
langdetect>=1.0.9
|
||||
|
||||
# Optional but recommended for richer ingestion / reranking:
|
||||
# openai-whisper
|
||||
# opencv-python-headless
|
||||
# sentence-transformers
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from pydantic import BaseModel
|
||||
from pydantic import BaseModel, ConfigDict
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
|
||||
@@ -38,8 +38,7 @@ class ChatSession(BaseModel):
|
||||
name: str
|
||||
created_at: datetime
|
||||
|
||||
class Config:
|
||||
orm_mode = True
|
||||
model_config = ConfigDict(from_attributes=True)
|
||||
|
||||
class SessionsResponse(BaseModel):
|
||||
sessions: List[ChatSession]
|
||||
@@ -67,3 +66,4 @@ class WebSearchRequest(BaseModel):
|
||||
class WebSearchResponse(BaseModel):
|
||||
enriched_prompt: str
|
||||
sources: List[str] = []
|
||||
context_block: str = ""
|
||||
|
||||
1
dist/assets/index-Cc0DLWqA.css
vendored
Normal file
1
dist/assets/index-Cc0DLWqA.css
vendored
Normal file
File diff suppressed because one or more lines are too long
69
dist/assets/index-DKAz6gtp.js
vendored
Normal file
69
dist/assets/index-DKAz6gtp.js
vendored
Normal file
File diff suppressed because one or more lines are too long
14
dist/index.html
vendored
Normal file
14
dist/index.html
vendored
Normal file
@@ -0,0 +1,14 @@
|
||||
|
||||
<!doctype html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<title>LLM Desktop</title>
|
||||
<script type="module" crossorigin src="/assets/index-DKAz6gtp.js"></script>
|
||||
<link rel="stylesheet" crossorigin href="/assets/index-Cc0DLWqA.css">
|
||||
</head>
|
||||
<body>
|
||||
<div id="root"></div>
|
||||
</body>
|
||||
</html>
|
||||
@@ -1,5 +1,4 @@
|
||||
|
||||
const { app, BrowserWindow, Menu, ipcMain, shell } = require('electron')
|
||||
const { app, BrowserWindow, Menu, dialog, ipcMain, shell } = require('electron')
|
||||
const path = require('path')
|
||||
const { is } = require('@electron-toolkit/utils')
|
||||
const fs = require('fs')
|
||||
@@ -9,12 +8,36 @@ let settingsWindow = null
|
||||
|
||||
const settingsFilePath = path.join(app.getPath('userData'), 'settings.json')
|
||||
let appSettings = {}
|
||||
const DEFAULT_UI_SCALE = 1
|
||||
const MIN_UI_SCALE = 0.7
|
||||
const MAX_UI_SCALE = 1.3
|
||||
|
||||
// Default settings
|
||||
const defaultSettings = {
|
||||
ollamaApiUrl: 'http://127.0.0.1:8000',
|
||||
colorScheme: 'Default',
|
||||
chatModel: 'llama3' // Set a default model here
|
||||
uiScale: DEFAULT_UI_SCALE,
|
||||
chatModel: 'llama3',
|
||||
}
|
||||
|
||||
function normalizeUiScale(value) {
|
||||
const numericValue = Number(value)
|
||||
if (!Number.isFinite(numericValue)) {
|
||||
return DEFAULT_UI_SCALE
|
||||
}
|
||||
|
||||
return Math.min(MAX_UI_SCALE, Math.max(MIN_UI_SCALE, Math.round(numericValue * 100) / 100))
|
||||
}
|
||||
|
||||
function applyUiScaleToWindow(window) {
|
||||
if (!window || window.isDestroyed()) {
|
||||
return
|
||||
}
|
||||
|
||||
window.webContents.setZoomFactor(normalizeUiScale(appSettings.uiScale))
|
||||
}
|
||||
|
||||
function applyUiScaleToAllWindows() {
|
||||
BrowserWindow.getAllWindows().forEach(applyUiScaleToWindow)
|
||||
}
|
||||
|
||||
function loadSettings() {
|
||||
@@ -24,8 +47,9 @@ function loadSettings() {
|
||||
appSettings = { ...defaultSettings, ...JSON.parse(data) }
|
||||
} else {
|
||||
appSettings = { ...defaultSettings }
|
||||
saveSettings() // Create the file with default settings
|
||||
saveSettings()
|
||||
}
|
||||
appSettings.uiScale = normalizeUiScale(appSettings.uiScale)
|
||||
} catch (error) {
|
||||
console.error('Failed to load settings:', error)
|
||||
appSettings = { ...defaultSettings }
|
||||
@@ -40,7 +64,7 @@ function saveSettings() {
|
||||
}
|
||||
}
|
||||
|
||||
async function createMainWindow () {
|
||||
async function createMainWindow() {
|
||||
mainWindow = new BrowserWindow({
|
||||
width: 1000,
|
||||
height: 720,
|
||||
@@ -50,17 +74,23 @@ async function createMainWindow () {
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, 'preload.cjs'),
|
||||
contextIsolation: true,
|
||||
nodeIntegration: false
|
||||
}
|
||||
nodeIntegration: false,
|
||||
},
|
||||
})
|
||||
|
||||
applyUiScaleToWindow(mainWindow)
|
||||
|
||||
mainWindow.on('ready-to-show', () => {
|
||||
mainWindow.show()
|
||||
})
|
||||
|
||||
mainWindow.webContents.on('did-finish-load', () => {
|
||||
applyUiScaleToWindow(mainWindow)
|
||||
})
|
||||
|
||||
mainWindow.on('focus', () => {
|
||||
mainWindow.webContents.send('window-focused');
|
||||
});
|
||||
mainWindow.webContents.send('window-focused')
|
||||
})
|
||||
|
||||
if (is.dev && process.env.VITE_DEV_SERVER_URL) {
|
||||
await mainWindow.loadURL(process.env.VITE_DEV_SERVER_URL)
|
||||
@@ -70,12 +100,12 @@ async function createMainWindow () {
|
||||
}
|
||||
|
||||
mainWindow.webContents.setWindowOpenHandler(({ url }) => {
|
||||
shell.openExternal(url);
|
||||
return { action: 'deny' };
|
||||
});
|
||||
shell.openExternal(url)
|
||||
return { action: 'deny' }
|
||||
})
|
||||
}
|
||||
|
||||
async function createSettingsWindow () {
|
||||
async function createSettingsWindow() {
|
||||
if (settingsWindow) {
|
||||
settingsWindow.focus()
|
||||
return
|
||||
@@ -91,14 +121,20 @@ async function createSettingsWindow () {
|
||||
webPreferences: {
|
||||
preload: path.join(__dirname, 'preload.cjs'),
|
||||
contextIsolation: true,
|
||||
nodeIntegration: false
|
||||
}
|
||||
nodeIntegration: false,
|
||||
},
|
||||
})
|
||||
|
||||
applyUiScaleToWindow(settingsWindow)
|
||||
|
||||
settingsWindow.on('ready-to-show', () => {
|
||||
settingsWindow.show()
|
||||
})
|
||||
|
||||
settingsWindow.webContents.on('did-finish-load', () => {
|
||||
applyUiScaleToWindow(settingsWindow)
|
||||
})
|
||||
|
||||
settingsWindow.on('closed', () => {
|
||||
settingsWindow = null
|
||||
})
|
||||
@@ -112,7 +148,7 @@ async function createSettingsWindow () {
|
||||
}
|
||||
|
||||
app.whenReady().then(() => {
|
||||
loadSettings() // Load settings when the app is ready
|
||||
loadSettings()
|
||||
createMainWindow()
|
||||
|
||||
const menuTemplate = [
|
||||
@@ -122,11 +158,11 @@ app.whenReady().then(() => {
|
||||
{
|
||||
label: 'Settings',
|
||||
accelerator: 'CmdOrCtrl+,',
|
||||
click: createSettingsWindow
|
||||
click: createSettingsWindow,
|
||||
},
|
||||
{ type: 'separator' },
|
||||
{ role: 'quit' }
|
||||
]
|
||||
{ role: 'quit' },
|
||||
],
|
||||
},
|
||||
{
|
||||
label: 'Edit',
|
||||
@@ -139,8 +175,8 @@ app.whenReady().then(() => {
|
||||
{ role: 'paste' },
|
||||
{ role: 'delete' },
|
||||
{ type: 'separator' },
|
||||
{ role: 'selectAll' }
|
||||
]
|
||||
{ role: 'selectAll' },
|
||||
],
|
||||
},
|
||||
{
|
||||
label: 'View',
|
||||
@@ -153,9 +189,9 @@ app.whenReady().then(() => {
|
||||
{ role: 'zoomin' },
|
||||
{ role: 'zoomout' },
|
||||
{ type: 'separator' },
|
||||
{ role: 'togglefullscreen' }
|
||||
]
|
||||
}
|
||||
{ role: 'togglefullscreen' },
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
const menu = Menu.buildFromTemplate(menuTemplate)
|
||||
@@ -166,23 +202,40 @@ app.whenReady().then(() => {
|
||||
})
|
||||
})
|
||||
|
||||
// IPC handlers for settings
|
||||
ipcMain.handle('get-settings', () => {
|
||||
return appSettings
|
||||
})
|
||||
ipcMain.handle('get-settings', () => appSettings)
|
||||
|
||||
ipcMain.handle('set-setting', (event, key, value) => {
|
||||
appSettings[key] = value
|
||||
appSettings[key] = key === 'uiScale' ? normalizeUiScale(value) : value
|
||||
saveSettings()
|
||||
if (key === 'uiScale') {
|
||||
applyUiScaleToAllWindows()
|
||||
}
|
||||
return true
|
||||
})
|
||||
|
||||
ipcMain.handle('update-settings', (event, settings) => {
|
||||
appSettings = { ...appSettings, ...settings }
|
||||
appSettings.uiScale = normalizeUiScale(appSettings.uiScale)
|
||||
saveSettings()
|
||||
if (Object.prototype.hasOwnProperty.call(settings, 'uiScale')) {
|
||||
applyUiScaleToAllWindows()
|
||||
}
|
||||
return true
|
||||
})
|
||||
|
||||
ipcMain.handle('pick-paths', async () => {
|
||||
const result = await dialog.showOpenDialog(mainWindow, {
|
||||
properties: ['openFile', 'openDirectory', 'multiSelections'],
|
||||
})
|
||||
return result.canceled ? [] : result.filePaths
|
||||
})
|
||||
|
||||
ipcMain.handle('open-path', async (event, filePath) => {
|
||||
if (!filePath) return false
|
||||
const err = await shell.openPath(filePath)
|
||||
return err === ''
|
||||
})
|
||||
|
||||
ipcMain.on('open-external-link', (event, url) => {
|
||||
shell.openExternal(url)
|
||||
})
|
||||
|
||||
@@ -6,6 +6,8 @@ contextBridge.exposeInMainWorld('electronAPI', {
|
||||
getSettings: () => ipcRenderer.invoke('get-settings'),
|
||||
setSetting: (key, value) => ipcRenderer.invoke('set-setting', key, value),
|
||||
updateSettings: (settings) => ipcRenderer.invoke('update-settings', settings),
|
||||
pickPaths: () => ipcRenderer.invoke('pick-paths'),
|
||||
openPath: (filePath) => ipcRenderer.invoke('open-path', filePath),
|
||||
openExternalLink: (event) => {
|
||||
event.preventDefault();
|
||||
const url = event.currentTarget.href;
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "concurrently -k \"npm:dev:backend\" \"npm:dev:renderer\" \"npm:dev:electron\"",
|
||||
"dev:backend": "python3 -m uvicorn backend.main:app --host 127.0.0.1 --port 8000 --reload",
|
||||
"dev:backend": "backend/.venv/bin/python -m uvicorn backend.main:app --host 127.0.0.1 --port 8000 --reload",
|
||||
"dev:renderer": "vite --port 5173 --strictPort",
|
||||
"dev:electron": "wait-on http://localhost:5173 tcp:8000 && cross-env VITE_DEV_SERVER_URL=http://localhost:5173 electron .",
|
||||
"build": "vite build",
|
||||
|
||||
19
run.sh
19
run.sh
@@ -1,6 +1,19 @@
|
||||
#!/bin/sh
|
||||
python -m venv backend/.venv
|
||||
source backend/.venv/bin/activate
|
||||
pip install -r backend/requirements.txt
|
||||
set -eu
|
||||
|
||||
PYTHON_BIN="${PYTHON_BIN:-python3.13}"
|
||||
VENV_DIR="backend/.venv"
|
||||
|
||||
if ! command -v "$PYTHON_BIN" >/dev/null 2>&1; then
|
||||
echo "Python 3.13 is required. Set PYTHON_BIN to a Python 3.13 executable if needed." >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -x "$VENV_DIR/bin/python" ] || ! "$VENV_DIR/bin/python" -c 'import sys; raise SystemExit(0 if sys.version_info[:2] == (3, 13) else 1)'; then
|
||||
rm -rf "$VENV_DIR"
|
||||
"$PYTHON_BIN" -m venv "$VENV_DIR"
|
||||
fi
|
||||
|
||||
"$VENV_DIR/bin/python" -m pip install -r backend/requirements.txt
|
||||
npm install
|
||||
npm run dev
|
||||
839
src/App.jsx
839
src/App.jsx
File diff suppressed because it is too large
Load Diff
@@ -1,27 +1,58 @@
|
||||
import React, { useState, useEffect } from 'react';
|
||||
import { colorSchemes, applyColorScheme } from './colorSchemes';
|
||||
import React, { useEffect, useState } from 'react'
|
||||
import { colorSchemes, applyColorScheme } from './colorSchemes'
|
||||
|
||||
const COLOR_SCHEME_KEY = 'colorScheme';
|
||||
const COLOR_SCHEME_KEY = 'colorScheme'
|
||||
const UI_SCALE_KEY = 'uiScale'
|
||||
const DEFAULT_UI_SCALE = 1
|
||||
const MIN_UI_SCALE = 0.7
|
||||
const MAX_UI_SCALE = 1.3
|
||||
const UI_SCALE_STEP = 0.05
|
||||
|
||||
function normalizeUiScale(value) {
|
||||
const numericValue = Number(value)
|
||||
if (!Number.isFinite(numericValue)) {
|
||||
return DEFAULT_UI_SCALE
|
||||
}
|
||||
|
||||
return Math.min(MAX_UI_SCALE, Math.max(MIN_UI_SCALE, Math.round(numericValue * 100) / 100))
|
||||
}
|
||||
|
||||
export default function InterfaceSettings() {
|
||||
const [selectedColorScheme, setSelectedColorScheme] = useState('Default');
|
||||
const [selectedColorScheme, setSelectedColorScheme] = useState('Default')
|
||||
const [uiScale, setUiScale] = useState(DEFAULT_UI_SCALE)
|
||||
|
||||
useEffect(() => {
|
||||
window.electronAPI.getSettings().then(settings => {
|
||||
setSelectedColorScheme(settings.colorScheme);
|
||||
applyColorScheme(settings.colorScheme);
|
||||
});
|
||||
}, []);
|
||||
const schemeName = settings.colorScheme || 'Default'
|
||||
setSelectedColorScheme(schemeName)
|
||||
setUiScale(normalizeUiScale(settings.uiScale))
|
||||
applyColorScheme(schemeName)
|
||||
})
|
||||
}, [])
|
||||
|
||||
useEffect(() => {
|
||||
applyColorScheme(selectedColorScheme);
|
||||
}, [selectedColorScheme]);
|
||||
applyColorScheme(selectedColorScheme)
|
||||
}, [selectedColorScheme])
|
||||
|
||||
const handleColorSchemeChange = (e) => {
|
||||
const newScheme = e.target.value;
|
||||
setSelectedColorScheme(newScheme);
|
||||
window.electronAPI.setSetting(COLOR_SCHEME_KEY, newScheme);
|
||||
};
|
||||
const handleColorSchemeChange = (event) => {
|
||||
const newScheme = event.target.value
|
||||
setSelectedColorScheme(newScheme)
|
||||
window.electronAPI.setSetting(COLOR_SCHEME_KEY, newScheme)
|
||||
}
|
||||
|
||||
const persistUiScale = (value) => {
|
||||
const nextScale = normalizeUiScale(value)
|
||||
setUiScale(nextScale)
|
||||
window.electronAPI.setSetting(UI_SCALE_KEY, nextScale)
|
||||
}
|
||||
|
||||
const handleUiScaleChange = (event) => {
|
||||
persistUiScale(event.target.value)
|
||||
}
|
||||
|
||||
const handleUiScaleReset = () => {
|
||||
persistUiScale(DEFAULT_UI_SCALE)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="settings-content-panel">
|
||||
@@ -39,6 +70,32 @@ export default function InterfaceSettings() {
|
||||
))}
|
||||
</select>
|
||||
</div>
|
||||
<div className="setting-section">
|
||||
<h3>UI Scale</h3>
|
||||
<div className="setting-control-row">
|
||||
<input
|
||||
type="range"
|
||||
className="range-input"
|
||||
min={MIN_UI_SCALE}
|
||||
max={MAX_UI_SCALE}
|
||||
step={UI_SCALE_STEP}
|
||||
value={uiScale}
|
||||
onChange={handleUiScaleChange}
|
||||
/>
|
||||
<span className="setting-value">{Math.round(uiScale * 100)}%</span>
|
||||
<button
|
||||
type="button"
|
||||
className="button"
|
||||
onClick={handleUiScaleReset}
|
||||
disabled={uiScale === DEFAULT_UI_SCALE}
|
||||
>
|
||||
Reset
|
||||
</button>
|
||||
</div>
|
||||
<p className="setting-description">
|
||||
Scales the whole interface, including fonts, spacing, and controls. 100% is the default size.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
)
|
||||
}
|
||||
|
||||
274
src/LibraryManager.jsx
Normal file
274
src/LibraryManager.jsx
Normal file
@@ -0,0 +1,274 @@
|
||||
import React, { useEffect, useState } from 'react'
|
||||
|
||||
function statusLabel(job) {
|
||||
if (!job) return null
|
||||
const progress = typeof job.progress === 'number' ? `${job.progress.toFixed(0)}%` : null
|
||||
const detail = job.detail ? ` ${job.detail}` : ''
|
||||
return `${job.type} · ${job.status}${progress ? ` · ${progress}` : ''}${detail}`
|
||||
}
|
||||
|
||||
export default function LibraryManager({
|
||||
apiBase,
|
||||
library,
|
||||
jobs,
|
||||
chatLibrarySlug,
|
||||
onRefresh,
|
||||
onToggleChatLibrary,
|
||||
onDeleted
|
||||
}) {
|
||||
const [busy, setBusy] = useState(false)
|
||||
const [isRenaming, setIsRenaming] = useState(false)
|
||||
const [renameValue, setRenameValue] = useState('')
|
||||
const [confirmDelete, setConfirmDelete] = useState(false)
|
||||
const [errorMessage, setErrorMessage] = useState('')
|
||||
|
||||
useEffect(() => {
|
||||
setIsRenaming(false)
|
||||
setRenameValue(library?.name || '')
|
||||
setConfirmDelete(false)
|
||||
setErrorMessage('')
|
||||
}, [library?.slug, library?.name])
|
||||
|
||||
async function expectOk(response) {
|
||||
if (response.ok) return response
|
||||
const detail = await response.text()
|
||||
throw new Error(detail || `HTTP ${response.status}`)
|
||||
}
|
||||
|
||||
async function runAction(fn) {
|
||||
setBusy(true)
|
||||
try {
|
||||
setErrorMessage('')
|
||||
await fn()
|
||||
setConfirmDelete(false)
|
||||
} finally {
|
||||
setBusy(false)
|
||||
await onRefresh()
|
||||
}
|
||||
}
|
||||
|
||||
async function addPaths() {
|
||||
if (!library) return
|
||||
const paths = await window.electronAPI?.pickPaths?.()
|
||||
if (!Array.isArray(paths) || paths.length === 0) return
|
||||
try {
|
||||
await runAction(async () => {
|
||||
const response = await fetch(`${apiBase}/libraries/${library.slug}/files/register`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ paths })
|
||||
})
|
||||
await expectOk(response)
|
||||
})
|
||||
} catch (error) {
|
||||
setErrorMessage(String(error?.message || error))
|
||||
}
|
||||
}
|
||||
|
||||
async function removeFile(rel) {
|
||||
if (!library) return
|
||||
try {
|
||||
await runAction(async () => {
|
||||
const response = await fetch(`${apiBase}/libraries/${library.slug}/files`, {
|
||||
method: 'DELETE',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ rel })
|
||||
})
|
||||
await expectOk(response)
|
||||
})
|
||||
} catch (error) {
|
||||
setErrorMessage(String(error?.message || error))
|
||||
}
|
||||
}
|
||||
|
||||
async function renameLibrary() {
|
||||
if (!library) return
|
||||
const name = renameValue.trim()
|
||||
if (!name || name === library.name) {
|
||||
setIsRenaming(false)
|
||||
setRenameValue(library.name || '')
|
||||
return
|
||||
}
|
||||
await runAction(async () => {
|
||||
const response = await fetch(`${apiBase}/libraries/${library.slug}`, {
|
||||
method: 'PATCH',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ name })
|
||||
})
|
||||
await expectOk(response)
|
||||
})
|
||||
setIsRenaming(false)
|
||||
}
|
||||
|
||||
async function deleteLibrary() {
|
||||
if (!library) return
|
||||
await runAction(async () => {
|
||||
const response = await fetch(`${apiBase}/libraries/${library.slug}`, { method: 'DELETE' })
|
||||
await expectOk(response)
|
||||
})
|
||||
onDeleted?.(library.slug)
|
||||
}
|
||||
|
||||
async function startJob(kind) {
|
||||
if (!library) return
|
||||
try {
|
||||
await runAction(async () => {
|
||||
const endpoint = `${apiBase}/libraries/${library.slug}/jobs/${kind}`
|
||||
const options = {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' }
|
||||
}
|
||||
if (kind === 'embed') {
|
||||
options.body = JSON.stringify({})
|
||||
}
|
||||
const response = await fetch(endpoint, options)
|
||||
await expectOk(response)
|
||||
})
|
||||
} catch (error) {
|
||||
setErrorMessage(String(error?.message || error))
|
||||
}
|
||||
}
|
||||
|
||||
if (!library) {
|
||||
return (
|
||||
<div className="placeholder-view">
|
||||
<p>Create a database, add files or folders, then build and index it for local RAG.</p>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
const activeJobs = (jobs || []).filter(job => job.slug === library.slug && (job.status === 'queued' || job.status === 'running'))
|
||||
const usingInChat = chatLibrarySlug === library.slug
|
||||
const canStartRename = () => {
|
||||
setRenameValue(library.name || '')
|
||||
setErrorMessage('')
|
||||
setIsRenaming(true)
|
||||
setConfirmDelete(false)
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="library-panel">
|
||||
{isRenaming && (
|
||||
<div className="library-inline-form">
|
||||
<input
|
||||
type="text"
|
||||
className="rename-input"
|
||||
value={renameValue}
|
||||
onChange={(e) => setRenameValue(e.target.value)}
|
||||
onKeyDown={(e) => {
|
||||
if (e.key === 'Enter') {
|
||||
renameLibrary().catch((error) => setErrorMessage(String(error?.message || error)))
|
||||
} else if (e.key === 'Escape') {
|
||||
setIsRenaming(false)
|
||||
setRenameValue(library.name || '')
|
||||
}
|
||||
}}
|
||||
autoFocus
|
||||
/>
|
||||
<div className="new-db-actions">
|
||||
<button
|
||||
className="button"
|
||||
disabled={busy}
|
||||
onClick={() => renameLibrary().catch((error) => setErrorMessage(String(error?.message || error)))}
|
||||
>
|
||||
Save
|
||||
</button>
|
||||
<button
|
||||
className="button ghost"
|
||||
onClick={() => {
|
||||
setIsRenaming(false)
|
||||
setRenameValue(library.name || '')
|
||||
}}
|
||||
>
|
||||
Cancel
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{confirmDelete && (
|
||||
<div className="library-inline-form danger-zone">
|
||||
<div className="muted-copy">Delete "{library.name}"? This removes the local index and metadata for this database.</div>
|
||||
<div className="new-db-actions">
|
||||
<button
|
||||
className="button danger"
|
||||
disabled={busy}
|
||||
onClick={() => deleteLibrary().catch((error) => setErrorMessage(String(error?.message || error)))}
|
||||
>
|
||||
Confirm Delete
|
||||
</button>
|
||||
<button className="button ghost" onClick={() => setConfirmDelete(false)}>Cancel</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{errorMessage && <div className="form-error">{errorMessage}</div>}
|
||||
|
||||
<div className="library-toolbar">
|
||||
<button className="button" disabled={busy} onClick={addPaths}>Add Files</button>
|
||||
<button className="button" disabled={busy || !library.files?.length} onClick={() => startJob('build')}>Build Corpus</button>
|
||||
<button className="button" disabled={busy || !library.states?.has_corpus} onClick={() => startJob('enrich')}>Enrich</button>
|
||||
<button className="button" disabled={busy || !library.states?.has_corpus} onClick={() => startJob('embed')}>Index</button>
|
||||
<button className="button" onClick={canStartRename}>Rename</button>
|
||||
<button className="button" onClick={() => onToggleChatLibrary(usingInChat ? null : library.slug)}>
|
||||
{usingInChat ? 'Stop Using In Chat' : 'Use In Chat'}
|
||||
</button>
|
||||
<button
|
||||
className="button danger"
|
||||
onClick={() => {
|
||||
setConfirmDelete(true)
|
||||
setIsRenaming(false)
|
||||
setErrorMessage('')
|
||||
}}
|
||||
>
|
||||
Delete
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="library-states">
|
||||
<div className={`state-pill ${library.states?.has_files ? 'ready' : ''}`}>Files: {library.files?.length || 0}</div>
|
||||
<div className={`state-pill ${library.states?.has_corpus ? 'ready' : ''}`}>Corpus: {library.artifacts?.corpus_records || 0}</div>
|
||||
<div className={`state-pill ${library.states?.is_enriched ? 'ready' : ''}`}>Enriched: {library.artifacts?.enhanced_records || 0}</div>
|
||||
<div className={`state-pill ${library.states?.is_indexed ? 'ready' : ''}`}>Indexed</div>
|
||||
</div>
|
||||
|
||||
{usingInChat && (
|
||||
<div className="library-chat-note">
|
||||
This database will be queried before each chat request and its context will be appended to the prompt.
|
||||
</div>
|
||||
)}
|
||||
|
||||
{activeJobs.length > 0 && (
|
||||
<div className="library-jobs">
|
||||
{activeJobs.map(job => (
|
||||
<div key={job.id} className={`job-card ${job.status}`}>
|
||||
{statusLabel(job)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="library-files">
|
||||
<h2>Files</h2>
|
||||
{library.files?.length ? (
|
||||
<div className="library-file-list">
|
||||
{library.files.map(file => (
|
||||
<div key={file.sha256 || file.rel} className="library-file-row">
|
||||
<div className="library-file-meta">
|
||||
<div className="library-file-name">{file.name || file.path}</div>
|
||||
<div className="library-file-path">{file.path}</div>
|
||||
</div>
|
||||
<div className="library-file-actions">
|
||||
<button className="button ghost" onClick={() => window.electronAPI?.openPath?.(file.path)}>Open</button>
|
||||
<button className="button ghost" onClick={() => removeFile(file.rel)}>Remove</button>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
) : (
|
||||
<p className="muted-copy">No files registered yet.</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,5 +1,18 @@
|
||||
const colorSchemes = {
|
||||
'Nightsky': {
|
||||
Default: {
|
||||
'--bg': '#0b1020',
|
||||
'--panel': '#141b34',
|
||||
'--text': '#e6e8ef',
|
||||
'--muted': '#9aa3b2',
|
||||
'--accent': '#6ea8fe',
|
||||
'--border': '#24304f',
|
||||
'--input-bg': '#121933',
|
||||
'--user-msg-bg': '#18213d',
|
||||
'--assistant-msg-bg': '#10172d',
|
||||
'--active-bg': 'rgba(110, 168, 254, 0.16)',
|
||||
'--hover-bg': 'rgba(255, 255, 255, 0.06)',
|
||||
},
|
||||
Nightsky: {
|
||||
'--bg': '#0a0e1a',
|
||||
'--panel': '#18203a',
|
||||
'--text': '#ffffff',
|
||||
@@ -12,7 +25,7 @@ const colorSchemes = {
|
||||
'--active-bg': 'rgba(74, 144, 226, 0.15)',
|
||||
'--hover-bg': 'rgba(255, 255, 255, 0.05)',
|
||||
},
|
||||
'Grayscale': {
|
||||
Grayscale: {
|
||||
'--bg': '#1a1a1a',
|
||||
'--panel': '#2a2a2a',
|
||||
'--text': '#f0f0f0',
|
||||
@@ -25,33 +38,33 @@ const colorSchemes = {
|
||||
'--active-bg': 'rgba(136, 136, 136, 0.15)',
|
||||
'--hover-bg': 'rgba(255, 255, 255, 0.05)',
|
||||
},
|
||||
'Japan': {
|
||||
Japan: {
|
||||
'--bg': '#ffffff',
|
||||
'--panel': '#f5f5f5',
|
||||
'--text': '#000000',
|
||||
'--muted': '#444444',
|
||||
'--accent': '#e74c3c', /* Vibrant Red */
|
||||
'--accent': '#e74c3c',
|
||||
'--border': '#999999',
|
||||
'--input-bg': '#ffffff',
|
||||
'--user-msg-bg': '#f0f0f0',
|
||||
'--assistant-msg-bg': '#f0f0f0',
|
||||
'--active-bg': 'rgba(231, 76, 60, 0.15)', /* Light red for active */
|
||||
'--hover-bg': 'rgba(231, 76, 60, 0.08)', /* Lighter red for hover */
|
||||
'--active-bg': 'rgba(231, 76, 60, 0.15)',
|
||||
'--hover-bg': 'rgba(231, 76, 60, 0.08)',
|
||||
},
|
||||
'Lime': {
|
||||
Lime: {
|
||||
'--bg': '#f0fff0',
|
||||
'--panel': '#e0ffe0',
|
||||
'--text': '#1a1a1a',
|
||||
'--muted': '#72a272ff',
|
||||
'--accent': '#deef88',
|
||||
'--accent': '#8e9f38ff',
|
||||
'--border': '#a0c0a0',
|
||||
'--input-bg': '#ffffff',
|
||||
'--user-msg-bg': '#f8f7ad',
|
||||
'--user-msg-bg': '#f8f7adff',
|
||||
'--assistant-msg-bg': '#f5fff5',
|
||||
'--active-bg': 'rgba(104, 159, 56, 0.2)',
|
||||
'--hover-bg': 'rgba(104, 159, 56, 0.1)',
|
||||
},
|
||||
'Vampire': {
|
||||
Vampire: {
|
||||
'--bg': '#1a050a',
|
||||
'--panel': '#2a1015',
|
||||
'--text': '#ffefff',
|
||||
@@ -64,15 +77,80 @@ const colorSchemes = {
|
||||
'--active-bg': 'rgba(216, 27, 96, 0.15)',
|
||||
'--hover-bg': 'rgba(255, 255, 255, 0.05)',
|
||||
},
|
||||
};
|
||||
'Sunset Drive': {
|
||||
'--bg': '#1f1024',
|
||||
'--panel': '#2e1632',
|
||||
'--text': '#fff2ea',
|
||||
'--muted': '#caa8b7',
|
||||
'--accent': '#ff8a5b',
|
||||
'--border': '#593050',
|
||||
'--input-bg': '#26132a',
|
||||
'--user-msg-bg': '#442038',
|
||||
'--assistant-msg-bg': '#32172c',
|
||||
'--active-bg': 'rgba(255, 138, 91, 0.18)',
|
||||
'--hover-bg': 'rgba(255, 138, 91, 0.08)',
|
||||
},
|
||||
'Aurora Pulse': {
|
||||
'--bg': '#07171d',
|
||||
'--panel': '#102730',
|
||||
'--text': '#eafcff',
|
||||
'--muted': '#9bc8cf',
|
||||
'--accent': '#54f2c2',
|
||||
'--border': '#214853',
|
||||
'--input-bg': '#0b2028',
|
||||
'--user-msg-bg': '#12313d',
|
||||
'--assistant-msg-bg': '#0f2530',
|
||||
'--active-bg': 'rgba(84, 242, 194, 0.18)',
|
||||
'--hover-bg': 'rgba(84, 242, 194, 0.08)',
|
||||
},
|
||||
'Sakura Neon': {
|
||||
'--bg': '#160b1d',
|
||||
'--panel': '#251331',
|
||||
'--text': '#fff5fd',
|
||||
'--muted': '#d4abc7',
|
||||
'--accent': '#ff4fb6',
|
||||
'--border': '#52315f',
|
||||
'--input-bg': '#1d1027',
|
||||
'--user-msg-bg': '#341844',
|
||||
'--assistant-msg-bg': '#281534',
|
||||
'--active-bg': 'rgba(255, 79, 182, 0.18)',
|
||||
'--hover-bg': 'rgba(255, 79, 182, 0.09)',
|
||||
},
|
||||
'Cobalt Punch': {
|
||||
'--bg': '#081527',
|
||||
'--panel': '#102643',
|
||||
'--text': '#eef6ff',
|
||||
'--muted': '#9fb7d0',
|
||||
'--accent': '#ffb703',
|
||||
'--border': '#234164',
|
||||
'--input-bg': '#0d1f37',
|
||||
'--user-msg-bg': '#162f54',
|
||||
'--assistant-msg-bg': '#102640',
|
||||
'--active-bg': 'rgba(255, 183, 3, 0.18)',
|
||||
'--hover-bg': 'rgba(255, 183, 3, 0.08)',
|
||||
},
|
||||
'Mango Mojito': {
|
||||
'--bg': '#fff7ea',
|
||||
'--panel': '#ffe9c8',
|
||||
'--text': '#2a1c13',
|
||||
'--muted': '#7c6150',
|
||||
'--accent': '#ff6b35',
|
||||
'--border': '#e6bf91',
|
||||
'--input-bg': '#fffdf9',
|
||||
'--user-msg-bg': '#fff0d7',
|
||||
'--assistant-msg-bg': '#fff8ed',
|
||||
'--active-bg': 'rgba(255, 107, 53, 0.14)',
|
||||
'--hover-bg': 'rgba(255, 107, 53, 0.08)',
|
||||
},
|
||||
}
|
||||
|
||||
function applyColorScheme(schemeName) {
|
||||
const scheme = colorSchemes[schemeName];
|
||||
if (scheme) {
|
||||
for (const [key, value] of Object.entries(scheme)) {
|
||||
document.documentElement.style.setProperty(key, value);
|
||||
}
|
||||
const scheme = colorSchemes[schemeName] || colorSchemes.Default
|
||||
if (!scheme) return
|
||||
|
||||
for (const [key, value] of Object.entries(scheme)) {
|
||||
document.documentElement.style.setProperty(key, value)
|
||||
}
|
||||
}
|
||||
|
||||
export { colorSchemes, applyColorScheme };
|
||||
export { colorSchemes, applyColorScheme }
|
||||
|
||||
@@ -9,9 +9,7 @@ import { applyColorScheme } from './colorSchemes'
|
||||
function Main() {
|
||||
useEffect(() => {
|
||||
window.electronAPI.getSettings().then(settings => {
|
||||
if (settings.colorScheme) {
|
||||
applyColorScheme(settings.colorScheme)
|
||||
}
|
||||
applyColorScheme(settings.colorScheme || 'Default')
|
||||
})
|
||||
}, [])
|
||||
|
||||
|
||||
157
src/styles.css
157
src/styles.css
@@ -245,6 +245,22 @@ body { background: var(--bg); color: var(--text); font-family: ui-sans-serif, sy
|
||||
background: var(--panel);
|
||||
}
|
||||
|
||||
.new-db-form,
|
||||
.library-inline-form {
|
||||
display: grid;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.new-db-actions {
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
}
|
||||
|
||||
.form-error {
|
||||
color: #ff9aa8;
|
||||
font-size: 12px;
|
||||
}
|
||||
|
||||
.new-chat-button {
|
||||
width: 100%;
|
||||
padding: 10px;
|
||||
@@ -291,6 +307,13 @@ body { background: var(--bg); color: var(--text); font-family: ui-sans-serif, sy
|
||||
.select { min-width: 220px; }
|
||||
.button { cursor: pointer; }
|
||||
.button:hover { border-color: var(--accent); }
|
||||
.button.ghost { background: transparent; }
|
||||
.button.danger { border-color: #8f3d49; color: #ffb8c2; }
|
||||
.button.danger:hover { border-color: #d86a79; }
|
||||
.header-subtle {
|
||||
color: var(--muted);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.chat {
|
||||
display: grid;
|
||||
@@ -510,6 +533,30 @@ textarea.input {
|
||||
min-width: unset;
|
||||
}
|
||||
|
||||
.setting-control-row {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 12px;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.range-input {
|
||||
width: min(360px, 100%);
|
||||
accent-color: var(--accent);
|
||||
}
|
||||
|
||||
.setting-value {
|
||||
min-width: 48px;
|
||||
color: var(--text);
|
||||
font-variant-numeric: tabular-nums;
|
||||
}
|
||||
|
||||
.setting-description {
|
||||
margin: 10px 0 0;
|
||||
color: var(--muted);
|
||||
line-height: 1.5;
|
||||
}
|
||||
|
||||
/* Markdown Styles */
|
||||
.msg h1, .msg h2, .msg h3, .msg h4 {
|
||||
margin: 10px 0;
|
||||
@@ -972,3 +1019,113 @@ input:checked + .slider:before {
|
||||
white-space: nowrap;
|
||||
margin-top: 0.5rem;
|
||||
}
|
||||
|
||||
.db-active-badge {
|
||||
margin-left: 8px;
|
||||
padding: 2px 8px;
|
||||
border-radius: 999px;
|
||||
background: color-mix(in srgb, var(--accent) 20%, transparent);
|
||||
color: var(--accent);
|
||||
font-size: 11px;
|
||||
}
|
||||
|
||||
.placeholder-view,
|
||||
.library-panel {
|
||||
overflow: auto;
|
||||
padding: 20px;
|
||||
}
|
||||
|
||||
.placeholder-view h1 {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
.library-toolbar {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 10px;
|
||||
margin-bottom: 18px;
|
||||
}
|
||||
|
||||
.library-states {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 10px;
|
||||
margin-bottom: 14px;
|
||||
}
|
||||
|
||||
.state-pill {
|
||||
padding: 6px 10px;
|
||||
border-radius: 999px;
|
||||
border: 1px solid var(--border);
|
||||
color: var(--muted);
|
||||
font-size: 13px;
|
||||
}
|
||||
|
||||
.state-pill.ready {
|
||||
color: var(--text);
|
||||
border-color: color-mix(in srgb, var(--accent) 45%, var(--border));
|
||||
}
|
||||
|
||||
.library-chat-note,
|
||||
.job-card {
|
||||
margin-bottom: 12px;
|
||||
padding: 12px 14px;
|
||||
border-radius: 12px;
|
||||
background: color-mix(in srgb, var(--panel) 82%, black);
|
||||
border: 1px solid var(--border);
|
||||
}
|
||||
|
||||
.library-inline-form {
|
||||
margin-bottom: 14px;
|
||||
padding: 12px 14px;
|
||||
border-radius: 12px;
|
||||
border: 1px solid var(--border);
|
||||
background: color-mix(in srgb, var(--panel) 88%, black);
|
||||
}
|
||||
|
||||
.danger-zone {
|
||||
border-color: #8f3d49;
|
||||
}
|
||||
|
||||
.library-files h2 {
|
||||
margin: 18px 0 12px;
|
||||
font-size: 16px;
|
||||
}
|
||||
|
||||
.library-file-list {
|
||||
display: grid;
|
||||
gap: 10px;
|
||||
}
|
||||
|
||||
.library-file-row {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
gap: 14px;
|
||||
align-items: flex-start;
|
||||
padding: 12px 14px;
|
||||
border-radius: 12px;
|
||||
border: 1px solid var(--border);
|
||||
background: color-mix(in srgb, var(--panel) 88%, black);
|
||||
}
|
||||
|
||||
.library-file-meta {
|
||||
min-width: 0;
|
||||
}
|
||||
|
||||
.library-file-name {
|
||||
font-weight: 600;
|
||||
margin-bottom: 4px;
|
||||
}
|
||||
|
||||
.library-file-path,
|
||||
.muted-copy {
|
||||
color: var(--muted);
|
||||
font-size: 13px;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.library-file-actions {
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
flex-shrink: 0;
|
||||
}
|
||||
|
||||
BIN
wheelcheck2117/pydantic-2.11.7-py3-none-any.whl
Normal file
BIN
wheelcheck2117/pydantic-2.11.7-py3-none-any.whl
Normal file
Binary file not shown.
BIN
wheelcheck274/pydantic-2.7.4-py3-none-any.whl
Normal file
BIN
wheelcheck274/pydantic-2.7.4-py3-none-any.whl
Normal file
Binary file not shown.
Reference in New Issue
Block a user