initial commit

This commit is contained in:
2025-09-12 21:45:11 +02:00
commit d8e4d77687
6 changed files with 4000 additions and 0 deletions

27
requirements.txt Normal file
View File

@@ -0,0 +1,27 @@
# Core libraries used by the GUI (concept_gui.py) and corpus builder
pymupdf
beautifulsoup4
requests
chardet
pillow
numpy
tqdm
ebooklib
markdown
tkinterdnd2
pdflatex
# Optional: language detection and image text-likeness improvements
langid
opencv-python-headless
# Optional (ASR for audio/video in corpus_builder.py)
openai-whisper
torch
# Notes (external binaries, install via system package manager):
# - pandoc (for highquality Markdown→PDF)
# - wkhtmltopdf (engine for HTML→PDF; enables local file access)
# - tesseract (OCR CLI)
# - ffmpeg/ffprobe (media handling)
# - ocrmypdf (for scanned PDFs)