Sifflet Agentᴮᴱᵀᴬ
You can connect Sifflet to sources running inside private networks with the Agent.
The Sifflet Agent is currently in private beta.
Sifflet will extend the capabilities of the Agent over time. Please expect some rough edges when running the Agent. We encourage all users of the Agent to submit their feedback and report any bug to their Customer Success Manager or Sifflet support.
Please contact Sifflet if you want to turn on this feature on your instance.
The Sifflet Agent is a lightweight service that you run inside your own infrastructure. When using the Agent, your sources don't have to be exposed on the Internet.
Overview
The Sifflet Agent queries your Sifflet instance to know which actions it needs to execute on the data sources, and then queries the data sources from inside of the private network. As such, there is not need to open ports for incoming traffic: the Agent only needs to be able to open outgoing HTTPS connections to your Sifflet SaaS instance.
Supported data sources
Currently, the following data sources are supported by the Agent:
Deploying the Agent
Requirements
The Agent is packaged as a Docker image, and runs anywhere you can run Docker containers (including Kubernetes, services such as AWS ECS, or bare virtual machines or servers).
The Agent needs to:
- Have access to the API of your Sifflet instance, through HTTPS. When using a Sifflet SaaS instance, if you access the Sifflet web application with
https://example.siffletdata.com
, your API URL ishttps://example.siffletdata.com/api
. - Have network access to the data sources it needs to handle, with the credentials you provide in the Sifflet web UI.
- Use a supported data source (see list above).
Installation
Create an access token for the Agent
The first step is to create an "Agent Service Account" level Access Token that will be used to authenticate the agent.
- Follow the Access Tokens documentation to create the Access Token.
- The Access Token needs to have the "Agent Service Account" role.
- Make sure to safely store the Access Token on your side, as it will only be displayed once after creation.
Deploy the agent
Pull the image
The Agent's Docker image is available in AWS' public gallery under the URI public.ecr.aws/sifflet/agent
.
For instance, here's how to pull the Agent image to your local machine:
docker pull public.ecr.aws/sifflet/agent:latest
Start the agent
To start the Agent, simply run the Docker image with the appropriate configuration. You will need to provide at least the Sifflet API URL (-u
) and the token generated in the previous step (-t
).
For example, with the Sifflet API URL https://example.siffletdata.com/api
and the token example-token
:
docker run public.ecr.aws/sifflet/agent:latest -u "https://example.siffletdata.com/api" -t "example-token"
Do not forget to give the Agent the necessary network permissions to connect to the Sifflet API and to query your data sources.
You will know that the Agent was successfully started when it logs the message Sifflet agent started
.
Create a source that runs on the agent
To check that the Agent works as expected, create a data source supported by the agent. You will have a checkbox giving you the option to use the Sifflet Agent on this data source.
Check this option, and then use the "Test Connection" button. If you get an error, check the Agent's logs to understand why the connection failed.
Configuration
Agent configuration
To configure the agent, there are three options:
- Using the CLI options directly.
- Using the CLI options in a configuration file.
- Using environment variables.
Using CLI options directly
--instance-access-token
, --token
or -t
(string or flag)
Your Sifflet instance access token. If used as a flag, this will prompt you for the token.
--instance-api-url
, --url
or -u
(string)
Your Sifflet instance API URL
--version
(flag)
Display version information
Example of usage:
docker run public.ecr.aws/sifflet/agent:latest --url "https://example.siffletdata.com/api" --token "example-token"
Refer to the --help
option for details and advanced options.
docker run public.ecr.aws/sifflet/agent:latest --help
Using CLI arguments in a configuration file
You can use a file to pass the same options as presented above. The following is an example file named .options.cmd
:
--url https://example.siffletdata.com/api
--token "example-token"
You can then reference it when running the Agent:
docker run public.ecr.aws/sifflet/agent:latest @.options.cmd
Using environment variables
All of the options can also be configured with environment variables. The names of the environment variable are the names of the options, in upper case, preprended with AGENT_
. For example:
export AGENT_INSTANCE_ACCESS_TOKEN="example-token"
export AGENT_INSTANCE_API_URL="https://example.siffletdata.com/api"
Use the agent in sources
When creating or updating a data source supported by the agent on the data source configuration page, you will have a checkbox giving you the option to use the Sifflet Agent on this data source.
You can then test the connection to the data source, and run the data source ingestion, as you would without the Agent.
Running the agent in production
Updates
We recommend keeping the agent up-to-date to get the latest features, bug fixes, and security improvements.
The latest
tag of the Docker image points to the latest agent version. You can regularly pull the latest image by referencing this tag.
Old versions of the agent may stop working if backwards-incompatible changes in the agent API are required. Sifflet will provide ample notice to all affected customers in such situations.
Resources requirements
The resources used by the Sifflet Agent will depend on how heavily you rely on it. As a starting point, the agent can run on 1 core/vCPU and 512 MiB of memory. Monitor the agent resource consumption and adjust accordingly.
The Sifflet Agent is stateless and doesn't require persistent storage.
Observability and logs
The agent emits JSON-formatted log messages on the standard output. You can capture these logs in your logging pipeline.
As of this writing, the Agent doesn't expose internal metrics.
High availability
You can run multiple agents simultaneously to achieve a high availability setup. Use a different token for each Sifflet Agent.
We recommend running the Agent under a process supervisor (or as a Kubernetes Deployment) so it's automatically restarted if it exits for any reason.
Troubleshooting
The Agent logs (output on standard output) should contain enough information to investigate most errors.
The Sifflet Agent exits shortly after starting
The Agent will output any error it encounters during startup. Common errors include:
- The Sifflet API URL is incorrect: check the
--instance-api-url
parameter. It should look likehttps://tenant.siffletdata.com/api
- note the trailing/api
- The API token is incorrect: try generating a new token from the Sifflet UI, and check the
--instance-access-token
parameter.
Sources that use the Sifflet Agent are reporting an error during refresh
- Check that the Agent is up and running.
- Check that the Agent logs don't contain any error.
- Check that the "Use Sifflet Agent" checkbox is enabled for the source.
- The Agent will log the jobs it receives from Sifflet. Check that there's activity in the Agent logs when the source is triggered in Sifflet.
Sources that use the Sifflet Agent are stuck in a "Running" or "Scheduled" state
- Check that the Agent is up and running. The Agent will pick up jobs that were scheduled before it started.
- Restart the Agent.
- Check that the Agent logs don't contain any error.
- The Agent will log the jobs it receives from Sifflet. Check that there's activity in the Agent logs when the source is triggered in Sifflet.
Alternatives
If the Sifflet Agent does not answer your requirements, Sifflet supports establishing AWS and Azure Private link connections.
Updated 12 days ago