About

I build AI systems for governance, legal, and institutional applications, working with large collections of court judgments, policy documents, and administrative records.

I have a Ph.D. in Mathematics (Probability Theory) from Indiana University Bloomington. That background shapes how I think about text at scale, finding structure in noisy documents and quantifying things that seem hard to measure.

I've built large-scale datasets and used machine learning to study bias in judicial systems, analyze policy documents, and extract structured information from bureaucratic text. I'm interested in how these tools can surface patterns that would otherwise be invisible.

I currently work with organizations in international development, public finance, and legal tech.

Experience

AI Architect (Consultant)

Paradigm Case Management, Denver · 2024–Present

Developed AI architecture and engineering for legal workflow automation platform serving prosecutors and case management professionals. Designed and deployed production agentic systems using LangGraph and Neo4j knowledge graphs. Built multi-agent structured information extraction systems and event-driven workflows using AWS Lambda and Bedrock.

Consultant (AI & Data Analytics)

World Bank Group, Washington D.C. · 2018–Present

Lead AI solutions development and large-scale data engineering for governance analytics across multi-country projects. Developed production RAG systems for knowledge search using Azure AI Search, expertise discovery and retrieval platforms, and data pipelines for processing and automating BOOST expenditure and budget data from various countries. Built large-scale legal datasets covering India (83M+ district court cases), Kenya, and Indonesia (12M records).

Consultant (AI & NLP)

Global Environment Facility (GEF), Washington D.C. · 2024–2025

Developed structured information extraction to track technology use across GEF project corpus, including classification and taxonomy development. Analyzed gaps between advisory recommendations and implementation. Built policy coherence identification tools and automatic feature extraction that replicates expert analysis patterns.

Team Leadership

AI Architecture & Engineering

Paradigm Case Management · 2024–Present

Lead technical architecture and cross-functional collaboration with backend engineering team. Drive architecture decisions for AI features, coordinate development priorities, and ensure alignment between AI capabilities and product requirements.

Team Lead, DeJure Projects

World Bank Group · 2019–2022

Led team of 10 across multiple research projects, making technical decisions and managing deliverables for research initiatives. Coordinated data collection and analytics workflows, organized task distribution, and provided technical mentorship on data processing methodologies. Established success metrics and maintained alignment with project objectives.

World Bank, ITS

World Bank Group · 2025–Present

Coordinate with development team to architect and deliver knowledge search and expertise discovery systems. Oversee technical requirements, feature prioritization, and deliverable timelines while ensuring alignment between technical implementation and organizational needs.

Publications

Peer-Reviewed Journal Articles

Book Chapters

Conference Proceedings

Working Papers

Reports

Dissertation & Thesis

Teaching & Talks

Extracting Structured Insights from Public Finance Documents with LLMs

Computational Impact Meetup (AI/ML Series), World Bank · June 2024

Online Training Session for Judicial Officers

JALDI, Telangana State Judicial Academy, India · July 2023

Presented findings from the co-authored POCSO report to over 300 judges and judicial officers of the district judiciary, as part of a training session organized by Vidhi Centre for Legal Policy.

Natural Language Processing for Economic Research

Computational Impact Meetup, World Bank · May 2023

Natural Language Processing

KREA University, Sri City, India · March 2022 Trimester

Taught undergraduate course covering fundamental NLP techniques for studying document corpora. Students selected document collections, formulated research questions, and applied NLP methods to analyze their chosen texts.

Introduction to Data Pipelines

DEC Python Course: Advanced Topics in Data Science with Python , World Bank

Covered building data pipelines in Databricks, including medallion architecture, ETL and ELT pipeline patterns, and best practices for scalable data processing.

Teaching Assistant

Department of Mathematics, Indiana University, Bloomington, USA

Served as TA for undergraduate and graduate courses including Calculus I and II, Linear Algebra, Mathematical Analysis, Introduction to Mathematical Statistics, Probability Theory, and Stochastic Processes.

Collaborators

Selected Projects

BOOST & PFM Analytics

World Bank Group · 2024–Present

Built Delta Live Tables pipelines in Databricks to automate BOOST data harmonization workflows across countries.

Developed bottleneck identification tool that extracts textual evidence from Public Expenditure Reviews and PFM reports across revenue management, budget planning, expenditure control, and institutional capacity.

Technologies/Tools: Delta Live Tables, Databricks, Instructor, OpenAI, Python, SQL

eCourts Dataset & Knowledge Search

World Bank Group · 2018–Present

Built one of the largest judicial datasets available, covering 83M+ district court cases, 8M high court cases, and 150K appellate cases from India. Developed web scraping infrastructure handling CAPTCHAs and dynamic content. Processed and structured data at scale for research and policy applications.

Designed retrieval-augmented generation system using Azure AI Search for multi-document question answering across development policy documents. Implemented query decomposition, cross-encoder reranking, and synthesis optimization.

Technologies/Tools: Scrapy, Selenium, MongoDB, Azure AI Search, LangGraph, LangChain, Redis

Entity Resolution and Knowledge Graph System

Paradigm Case Management & World Bank Group · 2024–Present

Multi-agent system for identifying identical persons across heterogeneous documents (court filings, police reports, case records). Implemented reflection pattern for iterative accuracy improvement and consensus-based extraction using judge-advocate-skeptic pattern. Developed Graph RAG system over Neo4j that automatically identifies missing evidence and documentation gaps by cross-referencing mentions across case files, police reports, and court filings. System alerts prosecutors to incomplete document chains before trial preparation, ensuring case integrity and discovery compliance.

Designed expertise quantification system aggregating signals across heterogeneous data sources for subject matter expert matching. Developed retrieval algorithms with multi-dimensional metrics balancing recency, depth, and breadth of expertise.

Technologies: Neo4j, LangGraph, CrewAI, Graph Transformers, Cypher, AWS Lambda, Pydantic, PostgreSQL, Azure AI Search, FAISS

Witchcraft Court Case Pattern Analysis

Personal Research · 2023–2024

Multi-agent system for analyzing narrative structures in witchcraft court cases. Used graph transformers to extract knowledge graphs identifying relationships between accusers, defendants, and alleged acts. Agents collaboratively extract accusation patterns, classify crime typologies, and identify recurring themes across cases. Interactive T-SNE visualization enables exploration of how different types of accusations and legal narratives cluster and evolve.

Technologies/Tools: Atomic Agents, Graph Transformers, OpenAI, Fly.io

Contact

sandeepbhupatiraju [at] gmail [dot] com