Eric Jin← Back to projects
PythonRAGFAISSAzure OpenAIGPT-4oLLMStreamlit

RAG Financial Analysis System

LLM-powered Q&A over Uber, Lyft, and United 10-K filings using FAISS and GPT-4o

Overview

Built a production-grade Retrieval-Augmented Generation (RAG) pipeline over SEC 10-K filings for Uber, Lyft, and United Airlines. The system uses dual FAISS vector indices (text + tables) with Azure OpenAI embeddings to retrieve relevant passages, then passes them to GPT-4o for grounded, citation-aware financial analysis — enabling natural language Q&A over thousands of pages of structured financial data.

Methods

  • Document ingestion: parsed Uber, Lyft, and United 10-K PDFs into 2,626 text chunks and 712 table chunks
  • Dual-index architecture: separate FAISS indices for narrative text and financial tables to preserve table structure during retrieval
  • Embedding: Azure OpenAI text-embedding-3-small for both indexing and query encoding
  • Retrieval: top-k=15 nearest neighbor search across both indices, combined into a unified context window
  • Generation: GPT-4o (Azure-hosted) with a structured prompt enforcing comparison, trend analysis, and actionable insights
  • Frontend: Streamlit interface for interactive query submission and response rendering

Key Findings

  • Dual-index retrieval significantly improves coverage — table index captures numerical data that text search misses
  • GPT-4o accurately compares multi-company financials: Uber 2023 revenue $37.28B vs Lyft $4.40B (8.5× gap), with correct growth rates (17% vs 7.5%)
  • System correctly identifies revenue trends, cost structures, and year-over-year changes from raw financial tables
  • Structured prompting (compare + trend + implication + next steps) produces analyst-quality responses rather than raw retrieval

Results

2,626 text chunks + 712 table chunks indexed across 3 company 10-K filings

Accurate cross-company financial comparison: Uber revenue 8.5× Lyft with 2.3× higher growth rate

Sub-second retrieval latency from FAISS approximate nearest neighbor search

Full Streamlit UI for interactive natural language queries over financial documents