Sifflet Agentᴮᴱᵀᴬ

You can connect Sifflet to sources running inside private networks with the Agent.

🚧

The Sifflet Agent is currently in private beta.

Sifflet will extend the capabilities of the Agent over time. Please expect some rough edges when running the Agent. We encourage all users of the Agent to submit their feedback and report any bug to their Customer Success Manager or Sifflet support.

Please contact Sifflet if you want to turn on this feature on your instance.

The Sifflet Agent is a lightweight service that you run inside your own infrastructure. When using the Agent, your sources don't have to be exposed on the Internet.

Overview

The Sifflet Agent queries your Sifflet instance to know which actions it needs to execute on the data sources, and then queries the data sources from inside of the private network. As such, there is not need to open ports for incoming traffic: the Agent only needs to be able to open outgoing HTTPS connections to your Sifflet SaaS instance.

Supported data sources

Currently, the following data sources are supported by the Agent:

Deploying the Agent

Requirements

The Agent is packaged as a Docker image, and runs anywhere you can run Docker containers (including Kubernetes, services such as AWS ECS, or bare virtual machines or servers).

The Agent needs to:

  • Have access to the API of your Sifflet instance, through HTTPS. When using a Sifflet SaaS instance, if you access the Sifflet web application with https://example.siffletdata.com, your API URL is https://example.siffletdata.com/api.
  • Have network access to the data sources it needs to handle, with the credentials you provide in the Sifflet web UI.
  • Use a supported data source (see list above).

Installation

Create an access token for the Agent

The first step is to create an "Agent Service Account" level Access Token that will be used to authenticate the agent.

  • Follow the Access Tokens documentation to create the Access Token.
  • The Access Token needs to have the "Agent Service Account" role.
  • Make sure to safely store the Access Token on your side, as it will only be displayed once after creation.

Deploy the agent

Pull the image

The Agent's Docker image is available in AWS' public gallery under the URI public.ecr.aws/sifflet/agent.

For instance, here's how to pull the Agent image to your local machine:

docker pull public.ecr.aws/sifflet/agent:latest

Start the agent

To start the Agent, simply run the Docker image with the appropriate configuration. You will need to provide at least the Sifflet API URL (-u) and the token generated in the previous step (-t).

For example, with the Sifflet API URL https://example.siffletdata.com/api and the token example-token:

docker run public.ecr.aws/sifflet/agent:latest -u "https://example.siffletdata.com/api" -t "example-token"

Do not forget to give the Agent the necessary network permissions to connect to the Sifflet API and to query your data sources.

You will know that the Agent was successfully started when it logs the message Sifflet agent started.

Create a source that runs on the agent

To check that the Agent works as expected, create a data source supported by the agent. You will have a checkbox giving you the option to use the Sifflet Agent on this data source.

Check this option, and then use the "Test Connection" button. If you get an error, check the Agent's logs to understand why the connection failed.

Configuration

Agent configuration

To configure the agent, there are three options:

  • Using the CLI options directly.
  • Using the CLI options in a configuration file.
  • Using environment variables.

Using CLI options directly

--instance-access-token, --token or -t (string or flag)

Your Sifflet instance access token. If used as a flag, this will prompt you for the token.

--instance-api-url, --url or -u (string)

Your Sifflet instance API URL

--version (flag)

Display version information

Example of usage:

docker run public.ecr.aws/sifflet/agent:latest --url "https://example.siffletdata.com/api" --token "example-token"

Refer to the --help option for details and advanced options.

docker run public.ecr.aws/sifflet/agent:latest --help

Using CLI arguments in a configuration file

You can use a file to pass the same options as presented above. The following is an example file named .options.cmd:

--url https://example.siffletdata.com/api
--token "example-token"

You can then reference it when running the Agent:

docker run public.ecr.aws/sifflet/agent:latest @.options.cmd

Using environment variables

All of the options can also be configured with environment variables. The names of the environment variable are the names of the options, in upper case, preprended with AGENT_. For example:

export AGENT_INSTANCE_ACCESS_TOKEN="example-token"
export AGENT_INSTANCE_API_URL="https://example.siffletdata.com/api"

Use the agent in sources

When creating or updating a data source supported by the agent on the data source configuration page, you will have a checkbox giving you the option to use the Sifflet Agent on this data source.

You can then test the connection to the data source, and run the data source ingestion, as you would without the Agent.

Running the agent in production

Updates

We recommend keeping the agent up-to-date to get the latest features, bug fixes, and security improvements.

The latest tag of the Docker image points to the latest agent version. You can regularly pull the latest image by referencing this tag.

Old versions of the agent may stop working if backwards-incompatible changes in the agent API are required. Sifflet will provide ample notice to all affected customers in such situations.

Resources requirements

The resources used by the Sifflet Agent will depend on how heavily you rely on it. As a starting point, the agent can run on 1 core/vCPU and 512 MiB of memory. Monitor the agent resource consumption and adjust accordingly.

The Sifflet Agent is stateless and doesn't require persistent storage.

Observability and logs

The agent emits JSON-formatted log messages on the standard output. You can capture these logs in your logging pipeline.

As of this writing, the Agent doesn't expose internal metrics.

High availability

You can run multiple agents simultaneously to achieve a high availability setup. Use a different token for each Sifflet Agent.

We recommend running the Agent under a process supervisor (or as a Kubernetes Deployment) so it's automatically restarted if it exits for any reason.

Known limitations

For now, the Sifflet Agent has the following limitations:

  • Using the "Preview data" feature on data assets belonging to data sources configured with the Agent will return an error, as this feature is not yet supported with the Agent.
  • Viewing or downloading Failing Rows in a failing monitor linked to an asset from a data source configured with the Agent will return an error, as this feature is not yet supported with the Agent.
  • Images are only provided for x86 architectures. Please let us know if you need images for other architectures.

Troubleshooting

The Agent logs (output on standard output) should contain enough information to investigate most errors.

The Sifflet Agent exits shortly after starting

The Agent will output any error it encounters during startup. Common errors include:

  • The Sifflet API URL is incorrect: check the --instance-api-url parameter. It should look like https://tenant.siffletdata.com/api - note the trailing /api
  • The API token is incorrect: try generating a new token from the Sifflet UI, and check the --instance-access-token parameter.

Sources that use the Sifflet Agent are reporting an error during refresh

  • Check that the Agent is up and running.
  • Check that the Agent logs don't contain any error.
  • Check that the "Use Sifflet Agent" checkbox is enabled for the source.
  • The Agent will log the jobs it receives from Sifflet. Check that there's activity in the Agent logs when the source is triggered in Sifflet.

Sources that use the Sifflet Agent are stuck in a "Running" or "Scheduled" state

  • Check that the Agent is up and running. The Agent will pick up jobs that were scheduled before it started.
  • Restart the Agent.
  • Check that the Agent logs don't contain any error.
  • The Agent will log the jobs it receives from Sifflet. Check that there's activity in the Agent logs when the source is triggered in Sifflet.

Alternatives

If the Sifflet Agent does not answer your requirements, Sifflet supports establishing AWS and Azure Private link connections.