Open Sources

Curated repos, tools, and frameworks shaping the developer ecosystem.
Live data from GitHub.

PageIndex | Open Source Review | Trending Society

...
{
  "title": "Financial Stability",
  "node_id": "0006",
  "start_index": 21,
  "end_index": 22,
  "summary": "The Federal Reserve ...",
  "nodes": [
    {
      "title": "Monitoring Financial Vulnerabilities",
      "node_id": "0007",
      "start_index": 22,
      "end_index": 28,
      "summary": "The Federal Reserve's monitoring ..."
    },
    {
      "title": "Domestic and International Cooperation and Coordination",
      "node_id": "0008",
      "start_index": 28,
      "end_index": 31,
      "summary": "In 2023, the Federal Reserve collaborated ..."
    }
  ]
}
...

pip3 install --upgrade -r requirements.txt

OPENAI_API_KEY=your_openai_key_here

python3 run_pageindex.py --pdf_path /path/to/your/document.pdf

Optional parameters

You can customize the processing with additional optional arguments:

--model                 LLM model to use (default: gpt-4o-2024-11-20)
--toc-check-pages       Pages to check for table of contents (default: 20)
--max-pages-per-node    Max pages per node (default: 10)
--max-tokens-per-node   Max tokens per node (default: 20000)
--if-add-node-id        Add node ID (yes/no, default: yes)
--if-add-node-summary   Add node summary (yes/no, default: yes)
--if-add-doc-description Add doc description (yes/no, default: yes)

Markdown support

We also provide markdown support for PageIndex. You can use the `--md_path` flag to generate a tree structure for a markdown file.

python3 run_pageindex.py --md_path /path/to/your/document.md

Note: in this mode, we use "#" to determine node headings and their levels. For example, "##" is level 2, "###" is level 3, etc. Make sure your markdown file is formatted correctly. If your Markdown file was converted from a PDF or HTML, we don't recommend using this mode, since most existing conversion tools cannot preserve the original hierarchy. Instead, use our PageIndex OCR, which is designed to preserve it, to convert the PDF to a markdown file and then use this mode.

# Install optional dependency
pip3 install openai-agents

# Run the demo
python3 examples/agentic_vectorless_rag_demo.py

Mingtian Zhang, Yu Tang and PageIndex Team,
"PageIndex: Next-Generation Vectorless, Reasoning-based RAG",
PageIndex Blog, Sep 2025.

Or use the BibTeX citation.

@article{zhang2025pageindex,
  author = {Mingtian Zhang and Yu Tang and PageIndex Team},
  title = {PageIndex: Next-Generation Vectorless, Reasoning-based RAG},
  journal = {PageIndex Blog},
  year = {2025},
  month = {September},
  note = {https://pageindex.ai/blog/pageindex-intro},
}

Open Sources

PageIndex

About this project

PageIndex: Vectorless, Reasoning-based RAG

🌐 Website • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact

📢 Updates

📑 Introduction to PageIndex

Related Projects

hermes-agent

yt-dlp

🎯 Core Features

📍 Explore PageIndex

🛠️ Deployment Options

🧪 Quick Hands-on

🌲 PageIndex Tree Structure

⚙️ Package Usage

1. Install dependencies

2. Set your LLM API key

3. Generate PageIndex structure for your PDF

Agentic Vectorless RAG: An Example

📈 Case Study: PageIndex Leads Finance QA Benchmark

🧭 Resources

⭐ Support Us

🌐 Open-Source Ecosystem

Connect with Us

stable-diffusion-webui

Open Sources

We read 100+ sources so you don't have to.

PageIndex

About this project

PageIndex: Vectorless, Reasoning-based RAG

🌐 Website • 🖥️ Chat Platform • 🔌 MCP & API • 📖 Docs • 💬 Discord • ✉️ Contact

📢 Updates

📑 Introduction to PageIndex

Related Projects

hermes-agent

yt-dlp

🎯 Core Features

📍 Explore PageIndex

🛠️ Deployment Options

🧪 Quick Hands-on

🌲 PageIndex Tree Structure

⚙️ Package Usage

1. Install dependencies

2. Set your LLM API key

3. Generate PageIndex structure for your PDF

Agentic Vectorless RAG: An Example

📈 Case Study: PageIndex Leads Finance QA Benchmark

🧭 Resources

⭐ Support Us

🌐 Open-Source Ecosystem

Connect with Us

stable-diffusion-webui