# Knowledge Sources

## Knowledge Sources

Knowledge Sources provide custom context that Cuppa references when generating content. Upload documents or paste text to give the AI specific information about your products, processes, or expertise that isn't available on the public web.

***

### Why Knowledge Sources Matter

Public AI models only know what's in their training data. They don't know:

* Your specific product features and pricing
* Internal processes and methodologies
* Proprietary research and data
* Company policies and guidelines
* Industry-specific terminology you use
* Your customer support phone number

Knowledge Sources bridge this gap. When you add a Knowledge Source, Cuppa splits your content into searchable chunks, stores them as vector embeddings, and retrieves the most relevant pieces during article generation.

This is **RAG (Retrieval-Augmented Generation)** in action.

***

### How It Works

When you upload a Knowledge Source:

1. **Chunking**: Your content is split into smaller pieces (roughly 1,000 characters each for files, 400 for text)
2. **Embedding**: Each chunk is converted into a vector embedding using OpenAI
3. **Storage**: Embeddings are stored in your team's knowledge base
4. **Retrieval**: During generation, Cuppa searches for chunks relevant to your topic and includes them in the AI prompt

#### Chunk Limits

Each Knowledge Source can store up to **100 chunks** (approximately 50,000 characters or 12,500 tokens). If your document exceeds this limit:

* The first 100 chunks are indexed
* Remaining content is not searchable
* You'll see a warning: "100 chunks indexed (limited from X)"

**Tip:** For large documents, split them into multiple focused Knowledge Sources for better coverage.

***

### Source Types

#### Text

Paste content directly into Cuppa.

**Best for:**

* Product descriptions
* FAQ content
* Style guidelines
* Key messaging
* Contact information
* Boilerplate text

**Example:**

```
Our flagship product, ContentFlow Pro, offers:
- Unlimited team seats ($99/month)
- AI-powered content optimization
- 50+ CMS integrations
- 24/7 priority support

Contact us: support@contentflow.com | 1-800-555-0123

Key differentiator: Only platform with real-time SEO scoring during editing.
```

#### File Upload

Upload PDF or TXT files up to **50MB**.

**Best for:**

* Product documentation
* White papers
* Research reports
* Employee handbooks
* Training materials

**Supported formats:**

| Format   | Notes                                                                            |
| -------- | -------------------------------------------------------------------------------- |
| PDF      | Text-based PDFs only. Scanned/image PDFs are not supported (no text to extract). |
| TXT      | Plain text files                                                                 |
| Markdown | .md files treated as text                                                        |

**Important:** PDFs must contain actual text, not images of text. If you can't select/copy text in your PDF, it's image-based and won't work. Use a text-based export or OCR tool first.

***

### Adding Knowledge Sources

1. Navigate to **AI Instructions > Brand Knowledge**
2. Click **Create new knowledge source**
3. Choose source type (Text or File)
4. Provide content and metadata:
   * **Name**: Descriptive name (e.g., "Product Features 2024")
   * **Description**: What this source contains
5. Click **Save**

After saving, you'll see indexing stats showing how many chunks were created.

#### Understanding Indexing Stats

After upload, each source displays:

* **"X chunks indexed"**: Your content was fully indexed
* **"X chunks indexed (limited from Y)"**: Content exceeded the 100-chunk limit

If limited, consider splitting the document into smaller, topic-focused sources.

***

### What to Include

Focus on information the AI can't find elsewhere:

✅ **Product specifics**: Features, pricing, specifications, SKUs ✅ **Contact information**: Phone numbers, emails, addresses ✅ **Brand guidelines**: Terminology, messaging, values ✅ **FAQs**: Common questions with approved answers ✅ **Case studies**: Customer success stories with metrics ✅ **Technical docs**: How things work, integrations, specs ✅ **Competitive positioning**: How you differ from competitors ✅ **Policies**: Return policies, guarantees, terms

#### What NOT to Include

❌ **Sensitive data**: Passwords, API keys, personal customer information ❌ **Massive documents**: Split into focused topics instead ❌ **Outdated information**: Causes incorrect outputs ❌ **Conflicting information**: Creates inconsistent content ❌ **Image-based PDFs**: Scanned documents without extractable text

***

### Best Practices

#### Keep Sources Focused

Smaller, topic-specific sources retrieve more accurately than massive documents. \*\*Note, we allow multiple knowledge sources for brands, but only one selected per generation in terms of when you are building!

| Content Type     | Recommended Approach                        |
| ---------------- | ------------------------------------------- |
| Product catalog  | One source per product line                 |
| FAQ documents    | Group by topic (billing, features, support) |
| Style guidelines | Single comprehensive source                 |
| Technical docs   | Split by feature area                       |

#### Use Descriptive Names

Good: "Enterprise Pricing 2026" or "Return Policy FAQ" Bad: "Document1" or "Info"

Names help you manage sources and help Cuppa understand context.

#### Update Regularly

Knowledge Sources reflect a point in time. Review quarterly:

1. Remove outdated sources
2. Update changed information
3. Add new products/features

#### Test Retrieval

After adding a source, test it in Agentic Chat:

> "What is our phone number for customer support?"

If the answer is correct, your Knowledge Source is working.

***

### How Knowledge Sources Are Used

#### During Article Generation

When generating content, Cuppa:

1. Analyzes your topic and keywords
2. Searches your Knowledge Sources for relevant information
3. Retrieves up to 12 of the most relevant chunks
4. Includes that context in the generation prompt

The AI sees your custom information alongside web research, creating content that's both current and accurate to your brand.

#### In Agentic Chat

Chat can access your Knowledge Sources directly:

* Ask questions about your products
* Request content using specific sources
* Fact-check against your documentation

**Example prompts:**

* "Using our pricing documentation, write a comparison table"
* "What does our product guide say about the enterprise tier?"
* "Draft an email using our approved messaging"

***

### Source Management

#### Updating Content

| Source Type | How to Update          |
| ----------- | ---------------------- |
| Text        | Edit directly in Cuppa |
| File        | Delete and re-upload   |

***

### Troubleshooting

#### "Content not being referenced"

**Possible causes:**

* Content isn't semantically relevant to your topic
* Similarity threshold not met

**Solutions:**

* Ensure your content uses terminology related to your topic
* Mention the source explicitly in chat for testing

#### "Wrong information being used"

**Cause:** Outdated or conflicting sources.

**Solution:** Audit sources, remove outdated content, resolve conflicts.

#### "0 chunks indexed"

**Cause:** PDF is image-based (scanned), not text-based.

**Solution:** Use a PDF with actual text, or convert with an OCR tool first. If you can't select/copy text in your PDF viewer, it's image-based.

#### "X chunks indexed (limited from Y)"

**Cause:** Document exceeded the 100-chunk limit.

**Solution:** Split into multiple smaller Knowledge Sources organized by topic.

#### "File upload failed"

**Check:**

* File is under 50MB
* Format is PDF, TXT, or Markdown
* PDF isn't password-protected
* PDF contains actual text (not scanned images)

***

### Related Features

* [AI Instructions](/brand-dna/ai-instructions-formerly-presets.md): Control generation settings and prompts
* [Brand Voice](/brand-dna/brand-voice.md): Consistent tone and style
* [Agentic Chat](/cuppa-chat/agentic-ai.md): Chat that uses your knowledge


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://learn.cuppa.ai/brand-dna/knowledge-sources.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
