This commit message was automatically generated by auto-git
search-and-analyze
Author: Victor Giers
Overview
This project contains a Python script that automates the process of generating search queries, performing web searches, extracting and analyzing content from web pages, and providing relevant information based on user input. The script uses advanced language models for query generation and analysis, ensures efficient web scraping with Playwright, and supports various configurations for different AI models.
Features
- Query Generation: Generates 5 optimized search queries based on user input using a specified LLM model.
- Web Search: Searches the web using SearXNG to find relevant URLs.
- Content Extraction: Extracts content from web pages, rendering JavaScript if necessary.
- Analysis: Analyzes extracted text for relevance and provides summaries or specific information based on user queries.
- Fallback Mechanism: Handles failures in data extraction or analysis by moving to the next URL.
- Parallel Processing: Loads and analyzes up to 5 URLs concurrently.
- Language Support: Automatically detects and supports multiple languages using
langidandlangcodes. - NLTK Integration: Supports NLTK for text summarization as an alternative to external LLM models.
Requirements
- Python 3.x
- Required Python packages:
requests,asyncio,argparse,urllib,json,re,termios,atexit,signal,time,langid,langcodes,collections,newspaper3k,nltk,playwright,langchain_core,langchain_ollama - SearXNG running locally at
http://127.0.0.1:8888
Installation
pip install requests asyncio argparse urllib json re termios atexit signal time langid langcodes collections newspaper3k nltk playwright langchain_core langchain_ollama
playwright install chromium
Usage
Run the script with a user prompt and optional arguments for analysis and query models.
python search-and-analyze.py "your search query" --analysis-model mistral-small3.1:24b --query-model mistral:latest
--analysis-model: Specifies the AI model to use for content analysis (e.g.,mistral-small3.1:24b). UseNLTKfor a local summary using Newspaper3k.--query-model: Specifies the LLM model to use for generating search queries (default ismistral:latest).
Example
python search-and-analyze.py "origin of Valentine’s Day" --analysis-model NLTK
This command will generate search queries related to the origin of Valentine's Day, perform web searches, extract and summarize content from relevant pages, and output the results.
License
This project is licensed under the MIT License - see the LICENSE file for details.