Skip to main content

How to add websites as Datasources in elvex

Learn how to sync a website as a datasource

Updated over 2 weeks ago

This guide will walk you through the process of adding and configuring websites as a datasource, allowing you to harness the power of web content in your elvex applications.

Prerequisites

Before we begin, make sure you have:

  • An active elvex account.

  • Access to the website feature in your account.

  • A website URL you want to add as a datasource.

Step 1: Accessing the Datasources

  1. Log in to your elvex account.

  2. Navigate to the Datasources section in the main menu.

  3. Click on the Create new datasource button.

  4. Select Website from the list of datasource types.

Step 2: Adding a Website URL

  1. In the Website field, enter the URL of the website you want to add (e.g., https://example.com).

  2. elvex will automatically validate the URL. If your URL has a path/subpaths (e.g., https://example.com/blog), use the include pages to configure your paths as seen in step 3.

Step 3: Configuring Include Pages (optional)

This is a way for you to include only specific website pages or paths when fetching website data. If you want to include all pages, you can skip this section.

  1. Under Include Pages select Add Include Rule. You can include up to 5 paths/subpaths. To include all pages, leave this empty.

  2. To include specific sections, add the path name. For example:

Tip: You can use regular expressions to specify more complex paths!

Step 4: Configuring Exclude Pages (optional)

This is a way for you to exclude specific website pages or paths when fetching website data. Leave this empty if there are no pages to exclude.

  1. Under Exclude Pages, select Add Exclude Rule. You can exclude up to 5 paths/subpaths.

  2. Common exclusions might include:

  • Admin pages: /admin/.*

  • User profiles: /users/.*

Step 5: Advanced Settings (Optional)

Click on Advanced Settings to access additional configuration options:

Max Depth: Set the maximum depth of pages to crawl (1-5). Default is 2. This controls how deep the crawler will go into your website's link structure. Starting from the homepage (depth 0), each level of links increases the depth by 1.

A higher depth value allows the crawler to follow more links, but may increase crawl time and data volume. The default value of 2 is suitable for most use cases.

Page Limit: Set the maximum number of pages to crawl (1-100). Default is 20.

Maximum number of pages that will be crawled from this website. Use this to prevent crawling too many pages from large websites.

Custom Headers: If your website requires authentication, you can add custom headers in JSON format.

For example:

  • {"Authorization": "Bearer 1234567890"}

Resync Frequency: Choose how often elvex should refresh the contents of this website. Default is weekly.

Step 6: Saving and Processing

  1. Review all your settings to ensure they’re correct.

  2. Click the Save button at the bottom of the website form.

  3. If you want to include another website in this datasource, select Add Website. Repeat steps 2-6. You can add up to 5 websites in a datasource.

  4. When you are satisfied with your configuration, select Save & Publish at the bottom of the page.

  5. Elvex will now begin processing your websites. This may take a few minutes, depending on the size of each site and your configured settings. Once the status indicator next to the website URL turns green, you can click the View pages button to see which website pages data was collected from.

What’s Next?

After successfully connecting elvex to your website(s):

  • The website will be automatically resynced based on your chosen frequency.

  • The content will be indexed and made available for use in your elvex assistants and flows!

Did this answer your question?