This project is an AI-powered technical article generator that uses ArXiv papers as its knowledge base. The system consists of two main components:
- arxivdatabase.py: A script to fetch and store ArXiv papers in a vector database
- arxivapp.py: A Streamlit web application that generates technical articles based on user queries
- Fetches thousands of ArXiv papers across multiple computer science categories
- Stores papers in a Chroma vector database with embeddings
- Provides a user-friendly web interface to generate technical articles
- Retrieves relevant research papers based on the user's topic
- Generates structured academic papers with proper citations
- Python 3.8+
- OpenAI API key
- Internet connection (for fetching ArXiv data)
- Clone the repository:
git clone https://github.com/HappyHackingSpace/AI-Research-Writers-Tool.git
cd arxiv-research-tool- Install dependencies:
pip install -r requirements.txt- Create a
.envfile in the project root and add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
Run the database builder script to fetch ArXiv papers and create the vector database:
python arxivdatabase.pyThis process may take several hours depending on the number of papers you're fetching (default is 5000).
Note: You can adjust the
num_papersvariable in the script to change the number of papers fetched per category.
After building the database, run the Streamlit application:
streamlit run arxivapp.pyThe web interface will open in your browser. Enter a topic in the text field and click "Generate Article" to create a technical article based on relevant ArXiv papers.
- Fetches papers from ArXiv across multiple computer science categories
- Extracts title, authors, summary, and other metadata
- Splits the text into chunks suitable for embedding
- Creates embeddings using OpenAI's text-embedding-3-small model
- Stores the embedded documents in a Chroma vector database
- Accepts a topic from the user via the Streamlit interface
- Searches the vector database for relevant research papers
- Allows the user to select the number of references to use
- Generates a structured academic paper using OpenAI's GPT-4o model
- Displays the generated article with proper formatting
You can modify the following parameters in the scripts:
num_papers: Number of papers to fetch per category (default: 5000)results_article: Maximum number of results per ArXiv API request (default: 200)batch_size: Number of documents to add to the vector database in each batch (default: 5000)chunk_size: Size of text chunks for embedding (default: 2000)chunk_overlap: Overlap between consecutive chunks (default: 100)model: LLM model used for generation (default: "gpt-4o")temperature: Creativity parameter for the LLM (default: 0.4)
- The ArXiv API has rate limits, so the fetching process includes sleep intervals
- Generating articles for very niche topics may result in less relevant content
- The quality of generated articles depends on the availability of relevant papers in the database
This project is licensed under the MIT License - see the LICENSE file for details.