brainKnowledge Sources

Custom Knowledge Sources

Knowledge Sources

Knowledge Sources provide custom context that Cuppa references when generating content. Upload documents or paste text to give the AI specific information about your products, processes, or expertise that isn't available on the public web.


Why Knowledge Sources Matter

Public AI models only know what's in their training data. They don't know:

  • Your specific product features and pricing

  • Internal processes and methodologies

  • Proprietary research and data

  • Company policies and guidelines

  • Industry-specific terminology you use

  • Your customer support phone number

Knowledge Sources bridge this gap. When you add a Knowledge Source, Cuppa splits your content into searchable chunks, stores them as vector embeddings, and retrieves the most relevant pieces during article generation.

This is RAG (Retrieval-Augmented Generation) in action.


How It Works

When you upload a Knowledge Source:

  1. Chunking: Your content is split into smaller pieces (roughly 1,000 characters each for files, 400 for text)

  2. Embedding: Each chunk is converted into a vector embedding using OpenAI

  3. Storage: Embeddings are stored in your team's knowledge base

  4. Retrieval: During generation, Cuppa searches for chunks relevant to your topic and includes them in the AI prompt

Chunk Limits

Each Knowledge Source can store up to 100 chunks (approximately 50,000 characters or 12,500 tokens). If your document exceeds this limit:

  • The first 100 chunks are indexed

  • Remaining content is not searchable

  • You'll see a warning: "100 chunks indexed (limited from X)"

Tip: For large documents, split them into multiple focused Knowledge Sources for better coverage.


Source Types

Text

Paste content directly into Cuppa.

Best for:

  • Product descriptions

  • FAQ content

  • Style guidelines

  • Key messaging

  • Contact information

  • Boilerplate text

Example:

File Upload

Upload PDF or TXT files up to 50MB.

Best for:

  • Product documentation

  • White papers

  • Research reports

  • Employee handbooks

  • Training materials

Supported formats:

Format
Notes

PDF

Text-based PDFs only. Scanned/image PDFs are not supported (no text to extract).

TXT

Plain text files

Markdown

.md files treated as text

Important: PDFs must contain actual text, not images of text. If you can't select/copy text in your PDF, it's image-based and won't work. Use a text-based export or OCR tool first.


Adding Knowledge Sources

  1. Navigate to AI Instructions > Brand Knowledge

  2. Click Create new knowledge source

  3. Choose source type (Text or File)

  4. Provide content and metadata:

    • Name: Descriptive name (e.g., "Product Features 2024")

    • Description: What this source contains

  5. Click Save

After saving, you'll see indexing stats showing how many chunks were created.

Understanding Indexing Stats

After upload, each source displays:

  • "X chunks indexed": Your content was fully indexed

  • "X chunks indexed (limited from Y)": Content exceeded the 100-chunk limit

If limited, consider splitting the document into smaller, topic-focused sources.


What to Include

Focus on information the AI can't find elsewhere:

Product specifics: Features, pricing, specifications, SKUs ✅ Contact information: Phone numbers, emails, addresses ✅ Brand guidelines: Terminology, messaging, values ✅ FAQs: Common questions with approved answers ✅ Case studies: Customer success stories with metrics ✅ Technical docs: How things work, integrations, specs ✅ Competitive positioning: How you differ from competitors ✅ Policies: Return policies, guarantees, terms

What NOT to Include

Sensitive data: Passwords, API keys, personal customer information ❌ Massive documents: Split into focused topics instead ❌ Outdated information: Causes incorrect outputs ❌ Conflicting information: Creates inconsistent content ❌ Image-based PDFs: Scanned documents without extractable text


Best Practices

Keep Sources Focused

Smaller, topic-specific sources retrieve more accurately than massive documents. **Note, we allow multiple knowledge sources for brands, but only one selected per generation in terms of when you are building!

Content Type
Recommended Approach

Product catalog

One source per product line

FAQ documents

Group by topic (billing, features, support)

Style guidelines

Single comprehensive source

Technical docs

Split by feature area

Use Descriptive Names

Good: "Enterprise Pricing 2026" or "Return Policy FAQ" Bad: "Document1" or "Info"

Names help you manage sources and help Cuppa understand context.

Update Regularly

Knowledge Sources reflect a point in time. Review quarterly:

  1. Remove outdated sources

  2. Update changed information

  3. Add new products/features

Test Retrieval

After adding a source, test it in Agentic Chat:

"What is our phone number for customer support?"

If the answer is correct, your Knowledge Source is working.


How Knowledge Sources Are Used

During Article Generation

When generating content, Cuppa:

  1. Analyzes your topic and keywords

  2. Searches your Knowledge Sources for relevant information

  3. Retrieves up to 12 of the most relevant chunks

  4. Includes that context in the generation prompt

The AI sees your custom information alongside web research, creating content that's both current and accurate to your brand.

In Agentic Chat

Chat can access your Knowledge Sources directly:

  • Ask questions about your products

  • Request content using specific sources

  • Fact-check against your documentation

Example prompts:

  • "Using our pricing documentation, write a comparison table"

  • "What does our product guide say about the enterprise tier?"

  • "Draft an email using our approved messaging"


Source Management

Updating Content

Source Type
How to Update

Text

Edit directly in Cuppa

File

Delete and re-upload


Troubleshooting

"Content not being referenced"

Possible causes:

  • Content isn't semantically relevant to your topic

  • Similarity threshold not met

Solutions:

  • Ensure your content uses terminology related to your topic

  • Mention the source explicitly in chat for testing

"Wrong information being used"

Cause: Outdated or conflicting sources.

Solution: Audit sources, remove outdated content, resolve conflicts.

"0 chunks indexed"

Cause: PDF is image-based (scanned), not text-based.

Solution: Use a PDF with actual text, or convert with an OCR tool first. If you can't select/copy text in your PDF viewer, it's image-based.

"X chunks indexed (limited from Y)"

Cause: Document exceeded the 100-chunk limit.

Solution: Split into multiple smaller Knowledge Sources organized by topic.

"File upload failed"

Check:

  • File is under 50MB

  • Format is PDF, TXT, or Markdown

  • PDF isn't password-protected

  • PDF contains actual text (not scanned images)


Last updated

Was this helpful?