Building an Azure AI RAG Pipeline with SharePoint - Brian Love

A practical guide to building a production RAG pipeline that indexes SharePoint documents into Azure AI Search, chunks intelligently, generates embeddings, and grounds LLM responses in real organizational data.

Most enterprise knowledge lives in SharePoint. Meeting notes, project proposals, technical documentation, compliance records — it sits across document libraries and site collections, invisible to any LLM without a retrieval layer.

I recently built a production RAG pipeline that connects SharePoint content to Azure AI Search, enabling an LLM to search and cite real company data during document generation. This post walks through the architecture, the implementation details, and the tradeoffs I encountered along the way.

The Architecture

The pipeline has four stages:

Ingest — Azure AI Search indexer pulls documents from SharePoint on a recurring schedule
Chunk & Embed — A skillset splits documents into chunks and generates vector embeddings
Index — Chunks land in an Azure AI Search index with both text and vector fields
Retrieve — Hybrid search (keyword + vector) finds relevant chunks at query time, fed to the LLM via tool calling

The LLM never sees raw SharePoint documents. It sees carefully chunked, embedded, and retrieved fragments — grounded context that keeps responses accurate and traceable.

Step 1: Connecting SharePoint as a Data Source

Azure AI Search has a built-in SharePoint Online indexer (currently in preview). It authenticates via OAuth using an Azure AD app registration. The connection string combines your SharePoint endpoint with the app credentials:

const connectionString = [
  `SharePointOnlineEndpoint=${process.env.SHAREPOINT_ENDPOINT}`,
  `ApplicationId=${process.env.SHAREPOINT_CLIENT_ID}`,
  `ApplicationSecret=${process.env.SHAREPOINT_CLIENT_SECRET}`,
  `TenantId=${process.env.SHAREPOINT_TENANT_ID}`,
].join(';');

The data source configuration points to the default site library. I found it simplest to index the entire library rather than filtering by folder at ingest time — folder-scoped filtering happens at query time instead.

const dataSourceConfig = {
  name: 'sharepoint-site-datasource',
  type: 'sharepoint',
  credentials: { connectionString },
  container: { name: 'defaultSiteLibrary' },
};

Creating the data source requires the REST API directly — the Azure Search SDK doesn’t support SharePoint data sources natively:

const response = await fetch(
  `${AZURE_SEARCH_ENDPOINT}/datasources?api-version=2025-08-01-preview`,
  {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'api-key': AZURE_SEARCH_ADMIN_KEY,
    },
    body: JSON.stringify(dataSourceConfig),
  }
);

Important caveat: The SharePoint indexer is still in preview and Microsoft has indicated it is not yet planned for general availability. For production workloads, they recommend either using Microsoft Graph API to export content to Azure Blob Storage (then index from there), or using the Copilot Retrieval API. I chose the direct indexer for speed of iteration, with the understanding that the ingestion layer can be swapped later without changing the rest of the pipeline.

Step 2: Defining the Search Index Schema

The search index needs both traditional text fields (for keyword search) and a vector field (for semantic search). Here’s the complete field configuration:

const fields = [
  {
    name: 'id',
    type: 'Edm.String',
    key: true,
    sortable: true,
    filterable: true,
    facetable: true,
    searchable: true,
    analyzerName: 'keyword',
  },
  { name: 'parent_id', type: 'Edm.String', filterable: true },
  { name: 'metadata_spo_item_name', type: 'Edm.String', searchable: true },
  {
    name: 'metadata_spo_item_path',
    type: 'Edm.String',
    searchable: true,
    filterable: true,
    sortable: true,
    analyzerName: 'keyword',
  },
  { name: 'metadata_spo_item_id', type: 'Edm.String', filterable: true },
  { name: 'metadata_spo_item_content_type', type: 'Edm.String', searchable: false },
  { name: 'metadata_spo_item_last_modified', type: 'Edm.DateTimeOffset', searchable: false },
  { name: 'chunk', type: 'Edm.String', searchable: true },
  {
    name: 'embedding',
    type: 'Collection(Edm.Single)',
    searchable: true,
    vectorSearchDimensions: 1536,
    vectorSearchProfileName: 'vector-profile-1',
  },
];

A few things to note:

metadata_spo_item_path uses a keyword analyzer. This enables exact prefix matching for folder-scoped queries later — critical for multi-tenant filtering.
The embedding field stores 1,536-dimensional vectors, matching Azure OpenAI’s embedding model output.
The chunk field is searchable, enabling BM25 keyword matching alongside vector search.

Step 3: Configuring the Vector Search Algorithm

The vector search algorithm determines how embeddings are compared at query time. I chose HNSW (Hierarchical Navigable Small World) with cosine similarity:

const vectorConfig = {
  algorithms: [
    {
      name: 'hnsw-1',
      kind: 'hnsw',
      parameters: {
        m: 4,            // connections per node (lower = faster, less accurate)
        efConstruction: 400,  // build-time search effort (higher = better index quality)
        efSearch: 500,        // query-time search effort (higher = better recall)
        metric: 'cosine',
      },
    },
  ],
  profiles: [
    {
      name: 'vector-profile-1',
      algorithmConfigurationName: 'hnsw-1',
    },
  ],
};

These parameters prioritize recall accuracy over raw speed. With efSearch: 500, the algorithm explores more candidates during queries, which matters when the correct document chunk might be one of hundreds of semantically similar results. For a pipeline where getting the right answer matters more than shaving milliseconds, this is the right tradeoff.

Step 4: Building the Skillset (Chunking + Embedding at Ingest Time)

Azure AI Search skillsets define processing steps that run during indexing. The skillset chains two key operations: text splitting (chunking) and embedding generation.

const skills = [
  {
    '@odata.type': '#Microsoft.Skills.Text.SplitSkill',
    name: 'text-split-skill',
    description: 'Split text into chunks',
    textSplitMode: 'pages',
    maximumPageLength: 1200,      // tokens per chunk
    defaultLanguageCode: 'en',
    context: '/document',
    inputs: [{ name: 'text', source: '/document/content' }],
    outputs: [{ name: 'textItems', targetName: 'chunks' }],
  },
  {
    '@odata.type': '#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill',
    name: 'azure-openai-embedding-skill',
    description: 'Generate embeddings for each chunk',
    resourceUri: process.env.IEM_LLM_ENDPOINT,
    apiKey: process.env.IEM_LLM_API_KEY,
    modelName: process.env.IEM_LLM_EMBEDDING_MODEL,
    deploymentId: process.env.IEM_LLM_EMBEDDING_DEPLOYMENT_ID,
    dimensions: 1536,
    context: '/document/chunks/*',
    inputs: [{ name: 'text', source: '/document/chunks/*' }],
    outputs: [{ name: 'embedding', targetName: 'embedding' }],
  },
];

The SplitSkill chunks at 1,200 tokens per page. This is a deliberate choice — large enough to preserve paragraph-level context, small enough to avoid diluting search relevance with irrelevant surrounding text.

The AzureOpenAIEmbeddingSkill runs on each chunk (/document/chunks/*), generating a 1,536-dimensional vector for every piece. This happens at ingest time, so queries only need to embed the query itself — not re-embed the entire corpus.

Index Projections: One Chunk Per Document

The skillset uses index projections to write one search document per chunk (not per source document):

const indexProjections = {
  selectors: [
    {
      targetIndexName: 'sharepoint-documents',
      parentKeyFieldName: 'parent_id',
      sourceContext: '/document/chunks/*',
      mappings: [
        { name: 'chunk', source: '/document/chunks/*' },
        { name: 'embedding', source: '/document/chunks/*/embedding' },
        { name: 'metadata_spo_item_name', source: '/document/metadata_spo_item_name' },
        { name: 'metadata_spo_item_path', source: '/document/metadata_spo_item_path' },
        { name: 'metadata_spo_item_id', source: '/document/parent_id_value' },
      ],
    },
  ],
  parameters: { projectionMode: 'skipIndexingParentDocuments' },
};

The skipIndexingParentDocuments parameter is important — without it, the index would contain both the full documents and their chunks, doubling storage and polluting search results.

Step 5: Creating the Indexer

The indexer ties together the data source, skillset, and target index. It runs on an hourly schedule and handles incremental updates automatically:

const indexer = {
  name: 'sharepoint-documents-indexer',
  dataSourceName: 'sharepoint-site-datasource',
  targetIndexName: 'sharepoint-documents',
  skillsetName: 'sharepoint-documents-skillset',
  parameters: {
    configuration: {
      indexedFileNameExtensions: '.pdf, .docx',
      excludedFileNameExtensions: '.png, .jpg, .tif, .zip',
      dataToExtract: 'contentAndMetadata',
      allowSkillsetToReadFileData: true,
      failOnUnsupportedContentType: false,
      failOnUnprocessableDocument: false,
    },
    maxFailedItems: 0,
    maxFailedItemsPerBatch: 0,
  },
  schedule: { interval: 'PT1H' },
};

Key configuration decisions:

File types: Only .pdf and .docx. Images and archives are excluded — they don’t contain searchable text.
failOnUnsupportedContentType: false: The indexer skips unsupported files instead of failing the entire batch. SharePoint libraries inevitably contain random file types.
maxFailedItems: 0: Strict — any failure causes the indexer run to report as failed. This ensures you notice data quality issues early.
PT1H schedule: Every hour. Frequent enough to pick up document changes within a workday, infrequent enough to avoid unnecessary API costs.

Step 6: Generating Query Embeddings

At query time, the user’s search query needs to be embedded using the same model that embedded the chunks. The embedding service wraps Azure OpenAI with batch processing and retry logic:

import { AzureOpenAI } from 'openai';

export function createAzureEmbeddingService(config = { maxBatchSize: 16 }) {
  const client = new AzureOpenAI({
    endpoint: process.env.IEM_LLM_ENDPOINT,
    apiKey: process.env.IEM_LLM_API_KEY,
    deployment: process.env.IEM_LLM_EMBEDDING_MODEL,
    apiVersion: '2024-12-01-preview',
  });

  async function generateEmbeddings(texts: string[]): Promise<number[][]> {
    if (!texts.length) return [];

    // Batch into groups of 16 (Azure's limit)
    const batches: string[][] = [];
    for (let i = 0; i < texts.length; i += config.maxBatchSize) {
      batches.push(texts.slice(i, i + config.maxBatchSize));
    }

    const results: number[][] = [];
    for (const batch of batches) {
      const vectors = await fetchWithRetry(batch);
      results.push(...vectors);
    }
    return results;
  }

  async function fetchWithRetry(batch: string[]): Promise<number[][]> {
    const maxRetries = 4;
    let attempt = 0;

    while (attempt < maxRetries) {
      try {
        const response = await client.embeddings.create({
          input: batch,
          model: process.env.IEM_LLM_EMBEDDING_MODEL,
        });
        return response.data.map((item) => item.embedding);
      } catch (error) {
        if (attempt < maxRetries - 1) {
          const delay = Math.pow(2, attempt) * 500 + Math.random() * 200;
          await new Promise((r) => setTimeout(r, delay));
        }
        attempt++;
      }
    }
    throw new Error('All embedding request attempts failed');
  }

  return { generateEmbeddings };
}

The retry logic uses exponential backoff with jitter (Math.pow(2, attempt) * 500 + Math.random() * 200). Azure OpenAI rate limits are aggressive, and the jitter prevents thundering herd problems when multiple requests fail simultaneously.

Step 7: Hybrid Search (Keyword + Vector)

The search repository performs hybrid queries that combine BM25 keyword matching with vector similarity. This is where the magic happens:

async function hybridSearch<T extends object>(
  folderId: string,
  queryText: string,
  embedding: number[],
  k: number,
  select?: string[]
): Promise<T[]> {
  // Build folder-scoped filter using range-based prefix match
  let filter: string | undefined;
  const folderPath = await sharepointService.getFolderServerRelativePath(folderId);
  if (folderPath) {
    // Range filter: [prefix, prefix + \uFFFF)
    filter = `metadata_spo_item_path ge '${folderPath}' and metadata_spo_item_path lt '${folderPath}\uFFFF'`;
  }

  const results = await searchService.search<T>(indexName, queryText, {
    embedding,
    k,
    select,
    filter,
  });

  return results.map((r) => r.document);
}

The folder path filter deserves attention. Rather than an exact match, it uses a range-based prefix filter: metadata_spo_item_path ge '/sites/docs/proposals' and lt '/sites/docs/proposals\uFFFF'. This matches all documents whose path starts with the folder prefix, regardless of subfolder depth. It’s simple, fast, and works with Azure Search’s keyword analyzer.

The search options builder combines keyword and vector search:

function buildSearchOptions(opts) {
  const { embedding, k = 20, select, filter } = opts;

  const options = { top: k, filter };
  if (select) options.select = select;

  if (embedding) {
    const neighbors = Math.max(3, Math.min(50, k));
    options.vectorSearchOptions = {
      queries: [
        {
          kind: 'vector',
          vector: embedding,
          kNearestNeighborsCount: neighbors,
          fields: ['embedding'],
        },
      ],
    };
  }

  return options;
}

When both queryText and embedding are provided, Azure AI Search automatically performs hybrid search — fusing BM25 keyword scores with vector similarity scores. This matters because the two approaches fail in complementary ways:

Keyword search misses semantic matches. “Staff augmentation” won’t match a chunk about “team expansion services.”
Vector search misses exact matches. Searching for a specific project name or acronym works better with keyword matching.

Hybrid search gets the best of both.

Step 8: Giving the LLM a Search Tool

The retrieval results don’t just get prepended to a prompt. Instead, the LLM is given a tool — search_company_knowledge — that it can call mid-generation when it needs specific data.

import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const searchSchema = z.object({
  query: z.string().describe(
    'Search query for finding relevant company knowledge. Be specific.'
  ),
  k: z.number().optional().default(5).describe(
    'Number of results to return (default: 5, max: 10)'
  ),
});

export function createSearchCompanyKnowledgeTool(deps) {
  const { documentSearchRepository, embeddingService, companyDocumentsFolderId } = deps;

  const searchHandler = async ({ query, k = 5 }) => {
    const maxK = Math.min(k, 10);

    // Generate embedding for hybrid search
    const [queryEmbedding] = await embeddingService.generateEmbeddings([query]);

    // Execute hybrid search
    const results = await documentSearchRepository.hybridSearch(
      companyDocumentsFolderId ?? 'global',
      query,
      queryEmbedding,
      maxK,
      ['id', 'metadata_spo_item_name', 'metadata_spo_item_path', 'chunk']
    );

    if (results.length === 0) {
      return JSON.stringify({
        success: true,
        message: 'No relevant documents found. Use placeholders for specific data.',
        results: [],
      });
    }

    return JSON.stringify({
      success: true,
      message: `Found ${results.length} relevant documents.`,
      results: results.map((r) => ({
        id: r.id,
        fileName: r.metadata_spo_item_name,
        filePath: r.metadata_spo_item_path,
        chunk: r.chunk,
      })),
    });
  };

  return tool(searchHandler, {
    name: 'search_company_knowledge',
    description: `Search internal documents for real project examples, metrics,
      client references, and company capabilities. You MUST use this tool when
      you need specific project names, quantitative metrics, or past performance examples.`,
    schema: searchSchema,
  });
}

The tool description is critical. It tells the LLM when to search (specific claims, metrics, project names) and what to expect back (chunks from indexed documents). The system prompt reinforces this with strict data authenticity rules:

The LLM must call the tool when claiming specific metrics, project names, or client references
The LLM must use only real data from search results
When data is unavailable, use explicit placeholders like [Insert specific project example] rather than fabricating

This is essential for enterprise use cases where credibility depends on real, verifiable claims. A hallucinated project name or fabricated statistic would destroy trust.

The Orchestration Layer: LangGraph

The full generation pipeline uses LangGraph to orchestrate a 6-node workflow:

identifyRfpSection — A lightweight model identifies which section of the source document is relevant
filterRfpContext — Extracts just the relevant section (reducing token usage)
buildContext — Assembles company info + filtered source content + compliance data into structured markdown
buildPrompt — Constructs system and user messages
invokeGeneration — Calls the LLM with the search tool bound, enabling mid-generation retrieval
parseResponse — Extracts and validates the output

The key insight is that the LLM decides when to search. During step 5, it can call search_company_knowledge as many times as it needs, interleaving retrieval with generation. This produces better results than pre-loading the context window with potentially irrelevant chunks.

Token Budget Strategy

Working within a 128K context window requires deliberate budgeting:

Component	Token Budget
Source content (filtered)	~20K
Compliance matrix	~10K
Template instructions	~10K
Generation output	~40K
System prompts, examples, buffer	~48K

The lightweight pre-filtering step (using a smaller model) reduces the full source document to just the relevant section before the expensive generation model sees it. This keeps costs down and improves relevance.

What I Learned

Chunking is the highest-leverage decision in the pipeline. Bad chunks produce bad retrieval, which produces bad generation. The built-in SplitSkill at 1,200 tokens per page was a good starting point, but I also built a custom hierarchical chunker for cases where more structure-aware splitting was needed — splitting on headings first, then paragraphs, then sentences, with 2-sentence overlap between chunks.

Hybrid search is worth the complexity. Pure vector search missed exact-match queries that mattered (project names, acronyms, specific dates). Adding keyword search alongside vector search eliminated an entire class of retrieval failures.

Tool calling beats prompt stuffing. Giving the LLM a search tool and letting it decide what to retrieve produced better results than pre-loading the context window with potentially irrelevant chunks. The LLM searches for what it actually needs, when it needs it.

The SharePoint indexer works but has limits. It’s convenient for prototyping and early production, but the preview status means you should architect the ingestion layer to be swappable. The rest of the pipeline — chunking, embedding, search, retrieval, generation — is independent of how documents enter the system.

Data authenticity rules in the system prompt matter. Without explicit instructions to search before claiming and to use placeholders when data is missing, the LLM will confidently fabricate details. For enterprise use cases, this is a dealbreaker.

The keyword analyzer on path fields enables clean folder scoping. Using a range-based prefix filter on metadata_spo_item_path provides fast, index-time folder scoping without needing separate indexes per folder.

The Stack

Component	Technology
Ingestion	Azure AI Search SharePoint indexer (hourly)
Chunking	Azure SplitSkill (1,200 tokens) + custom hierarchical chunker
Embeddings	Azure OpenAI (1,536 dimensions)
Vector Search	HNSW algorithm (cosine, m=4, efSearch=500)
Index	Azure AI Search (hybrid: BM25 + vector)
Orchestration	LangGraph state machine (6-node workflow)
LLM	Azure OpenAI with tool calling
Runtime	Node.js, TypeScript, NestJS

Where This Is Heading

The next evolution is agentic retrieval — where the search layer itself becomes an agent that can plan multi-step queries, reformulate searches based on initial results, and maintain context across a conversation. Azure AI Search recently previewed agentic retrieval capabilities, and the patterns described here are a natural foundation for that transition.

RAG pipelines are not a solved problem. But the combination of structure-aware chunking, hybrid search, dynamic tool calling, and strict data authenticity rules gets you surprisingly far — especially when the alternative is an LLM making things up about your company.