28 lines
613 B
Plaintext
28 lines
613 B
Plaintext
|
|
# Core libraries used by the GUI (concept_gui.py) and corpus builder
|
|||
|
|
pymupdf
|
|||
|
|
beautifulsoup4
|
|||
|
|
requests
|
|||
|
|
chardet
|
|||
|
|
pillow
|
|||
|
|
numpy
|
|||
|
|
tqdm
|
|||
|
|
ebooklib
|
|||
|
|
markdown
|
|||
|
|
tkinterdnd2
|
|||
|
|
pdflatex
|
|||
|
|
|
|||
|
|
# Optional: language detection and image text-likeness improvements
|
|||
|
|
langid
|
|||
|
|
opencv-python-headless
|
|||
|
|
|
|||
|
|
# Optional (ASR for audio/video in corpus_builder.py)
|
|||
|
|
openai-whisper
|
|||
|
|
torch
|
|||
|
|
|
|||
|
|
# Notes (external binaries, install via system package manager):
|
|||
|
|
# - pandoc (for high‑quality Markdown→PDF)
|
|||
|
|
# - wkhtmltopdf (engine for HTML→PDF; enables local file access)
|
|||
|
|
# - tesseract (OCR CLI)
|
|||
|
|
# - ffmpeg/ffprobe (media handling)
|
|||
|
|
# - ocrmypdf (for scanned PDFs)
|