Skip to main content

How to add websites as Datasources in elvex

📌 This article is for users of elvex 2.0. Find out which version you're using.

This guide will walk you through the process of adding and configuring a website as a datasource resource in elvex, allowing you to harness the power of web content in your conversations and agents.

Prerequisites

Before we begin, make sure you have:

  • An active elvex account

  • Access to the website feature in your account

  • A website URL you want to add as a datasource

Step 1: Create a new Datasource resource

  1. Click the + in the sidebar

  2. Select Datasource

  3. Select Website as the datasource type

  4. In the Website field, enter the URL of the website you want to add (e.g., https://example.com)

  5. Give your datasource a clear, descriptive name — this helps elvex understand what’s in it and surface it at the right time

  6. Click View full config to access all datasource settings

Tip: elvex will automatically validate the URL. If your URL has a path/subpaths (e.g., https://example.com/blog), use Include Pages to configure your paths as seen in Step 3.

Step 2: Configure Include Pages (optional)

This lets you include only specific website pages or paths when fetching data. Skip this section if you want to include all pages.

  1. Under Include Pages, select Add Include Rule. You can include up to 5 paths/subpaths.

  2. To include specific sections, add the path name:

    1. To include all blog posts: enter /blog/.*. This would include "https://example.com/blog/post-1" and "https://example.com/blog/post-2" but not "https://example.com/about"

    2. To include multiple sections, select Add Include Rule to add another row

Tip: You can use regular expressions to specify more complex paths!

Step 3: Configure Exclude Pages (optional)

Leave this empty if there are no pages to exclude.

  1. Under Exclude Pages, select Add Exclude Rule. You can exclude up to 5 paths/subpaths.

  2. Common exclusions might include:

    1. Admin pages: /admin/.*

    2. User profiles: /users/.*

Step 4: Adjust Advanced Settings (optional)

Click Advanced Settings to access additional options:

Max Depth: Set the maximum depth of pages to crawl (1–5). Default is 2. This controls how deep the crawler will go into your website’s link structure. Starting from the homepage (depth 0), each level of links increases the depth by 1. A higher depth value allows the crawler to follow more links, but may increase crawl time and data volume.

Page Limit: Set the maximum number of pages to crawl (1–100). Default is 20. Use this to prevent crawling too many pages from large websites.

Custom Headers: If your website requires authentication, add custom headers in JSON format. For example:

  • {"Authorization": "Bearer 1234567890"}

Resync Frequency: Choose how often elvex should refresh the website contents. Default is weekly.

Step 5: Save and process

  1. Review all your settings

  2. Click the Save button at the bottom of the website form

  3. To include another website in this datasource, select Add Website and repeat steps 2–6. You can add up to 5 websites per datasource.

  4. When satisfied, select Save & Publish

  5. elvex will begin processing your websites. Once the status indicator turns green, you can click View pages to see which pages were collected.

What’s Next?

Once your website datasource resource is created:

  • It will be automatically resynced

  • It can be pinned to a Space so it’s always available to your team without manual attachment

  • It can be connected to agents to ground their responses in your web content

Did this answer your question?