RAGDC
All guides
Hands-on tutorials
5 min read

Import large document collections

Hundreds of contract PDFs to ingest? An entire SharePoint site? This guide explains which approach fits your scale and how to triage failed uploads quickly.

Key takeaways

  • Under 50: drag-and-drop. Hundreds: batch upload. Entire SharePoint sites: Live connector.
  • The SharePoint Live connector keeps files in your Microsoft cloud — they never leave your tenant.
  • 90% of failures are scanned PDFs, encrypted files, or oversized Excels — the console surfaces the reason.
1

1. Pick the right method for your scale

Three options by scale. Under 50 files: just drag-and-drop in the console. A few hundred to a thousand: use the Batch Upload button to select everything at once. An entire SharePoint site or OneDrive: enable the SharePoint Live connector for live indexing — files stay in your Microsoft cloud the whole time, meeting enterprise compliance.

2

2. Connect SharePoint Live

Workspace Settings → SharePoint Live → Authorize with your Microsoft 365 account, then pick the sites or folders to connect. Once set up, when your team asks questions in the Playground, the system queries your SharePoint files live — update a document on SharePoint and the AI uses the new version instantly, no re-upload needed.

3

3. Troubleshoot failed uploads

When a document shows Failed status, click into its detail page to see the reason. Top 3 causes: scanned PDFs (image-only, no text layer — run OCR first to make them searchable); encrypted PPT files (remove password in PowerPoint first); oversized Excels (split by worksheet and upload separately).

4

4. Monitor batch ingest progress

The header progress bar shows pending + processing counts so you always know what's left. The KB detail page filters documents by status (Pending / Processing / Ready / Failed) to quickly spot problems. After a batch ingest completes, run a few real questions in the Playground to validate answer quality.