Victor Giers 80782eb80f Remove redundant section from README.md
This commit message was automatically generated by auto-git
2025-05-21 04:26:22 +02:00
2025-05-21 04:25:09 +02:00
2025-05-21 04:25:09 +02:00
2025-05-21 04:25:09 +02:00

search-and-analyze

Author: Victor Giers

Overview

This project contains a Python script that automates the process of generating search queries, performing web searches, extracting and analyzing content from web pages, and providing relevant information based on user input. The script uses advanced language models for query generation and analysis, ensures efficient web scraping with Playwright, and supports various configurations for different AI models.

Features

  1. Query Generation: Generates 5 optimized search queries based on user input using a specified LLM model.
  2. Web Search: Searches the web using SearXNG to find relevant URLs.
  3. Content Extraction: Extracts content from web pages, rendering JavaScript if necessary.
  4. Analysis: Analyzes extracted text for relevance and provides summaries or specific information based on user queries.
  5. Fallback Mechanism: Handles failures in data extraction or analysis by moving to the next URL.
  6. Parallel Processing: Loads and analyzes up to 5 URLs concurrently.
  7. Language Support: Automatically detects and supports multiple languages using langid and langcodes.
  8. NLTK Integration: Supports NLTK for text summarization as an alternative to external LLM models.

Requirements

  • Python 3.x
  • Required Python packages: requests, asyncio, argparse, urllib, json, re, termios, atexit, signal, time, langid, langcodes, collections, newspaper3k, nltk, playwright, langchain_core, langchain_ollama
  • SearXNG running locally at http://127.0.0.1:8888

Installation

pip install requests asyncio argparse urllib json re termios atexit signal time langid langcodes collections newspaper3k nltk playwright langchain_core langchain_ollama
playwright install chromium

Usage

Run the script with a user prompt and optional arguments for analysis and query models.

python search-and-analyze.py "your search query" --analysis-model mistral-small3.1:24b --query-model mistral:latest
  • --analysis-model: Specifies the AI model to use for content analysis (e.g., mistral-small3.1:24b). Use NLTK for a local summary using Newspaper3k.
  • --query-model: Specifies the LLM model to use for generating search queries (default is mistral:latest).

Example

python search-and-analyze.py "origin of Valentines Day" --analysis-model NLTK

This command will generate search queries related to the origin of Valentine's Day, perform web searches, extract and summarize content from relevant pages, and output the results.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Description
Automates query generation, web searching, content extraction, and analysis with support for parallel processing and language detection.
Readme 35 KiB
Languages
Python 100%