Data security and cloud storage integrations
Updated at April 26th, 2024
How Sama secures your data
Data security is the first thought that comes to mind when talking about data storage. That's why Sama is committed to adhering to industry-standard data security and protection standards. Both the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are followed to ensure Sama is using your information only when necessary.
In addition to making sure your personal information is secure, security and compliance is at the forefront of how Sama operates.
Cloud storage integrations
Sama offers a cloud storage integration with Azure Blob Storage, Google Cloud Platform and AWS S3. Sama directly accesses images, videos, and 3D assets from your cloud storage without retaining a copy.
However, certain assets, such as videos and 3D, need to be processed and transformed by Sama to ensure compatibility with our platform. Once transformed, these assets can be saved back to your cloud storage and accessed (with CDN) by Sama's annotators.
As an alternative, assets can be uploaded to and/or accessed directly from Sama's own S3 storage in Germany, USA, and India.
Hosting details | Pros | Cons |
---|---|---|
Complete integration with your cloud storage |
• Sama never retains a copy of your assets
• You have complete control over assets |
• Cloud platform configuration of permissions required
• Asset security outside the workspace cannot be guaranteed due to outside infrastructure
• Azure, GCP support for image, video and 3D. |
Assets are fetched from your cloud storage but retained in Sama's S3 |
• You have complete control over uploaded assets
• Full compatibility with all Sama annotation options |
• Service interruptions may affect access • Assets will be copied to Sama's S3 bucket for the duration of the project |
Assets are uploaded directly to Sama's S3 |
• Sama is better able to prevent service interruptions
• Full compatibility with all Sama annotation options |
• Assets will be copied to Sama's S3 bucket for the duration of the project |
Azure Blob Storage
Use native Identity and Access Management (IAM) roles and policies to provide Sama access to your data. The Azure integration supports image, video, and 3D assets, and Sama never retains a copy of your asset.
How Sama Authenticates with Your Azure Instance
- Sama obtains a token from AWS Cognito Identity set up in your Azure instance.
- Using the token from AWS Cognito Identity, Sama authenticates with Azure AD. This process results in obtaining a workload identity. In our context, this workload identity corresponds to an App registration that has Federated credentials configured.
- Finally, Sama communicates with Azure Blob Storage using the acquired workload identity. This allows Sama to manage resources within the Storage Account and Storage Container that have been pre-configured.
Workload identities provide a method for securely accessing Azure resources, eliminating the need for the storage and management of secrets, such as usernames, passwords, or client secrets.
Read more about workload identities
Workload identities in Azure come in two main types: Azure AD Applications and Managed Identities.
- Azure AD Applications: These are applications or services that have been registered within Azure AD. Each possesses its own unique identity and can authenticate and access Azure resources by using this identity. Specific characteristics of Azure AD Applications include:
- They are capable of obtaining access tokens from Azure AD. These tokens can then be used to access Azure resources, APIs, or other services, according to the permissions that have been assigned to the identity.
- They can establish trusted relationships with other identity providers (for example, AWS Cognito Identity). This trust is established by acknowledging tokens issued by these external identity providers, thereby enabling cross-access to resources among different cloud providers. This is typically achieved using App registrations.
- They leverage protocols such as OAuth 2.0 or OpenID Connect to acquire access tokens.
- They can be designated specific roles and permissions, allowing for a nuanced control of access to Azure resources. Access rights are determined based on the roles assigned to the identity.
- Managed identities: This feature is tightly associated with the Azure resources they belong to. Managed identities are used for authenticating and directly accessing these associated resources.
- Sama's integration with Azure doesn't make use of managed identities.
Steps to set up Azure with Sama
-
Set Up Azure Storage:
- Create or use an existing Azure Storage Account.
- Create or use an existing Azure Storage Container.
-
Register an Azure Application:
- Go to App registrations and register an application for Sama.
- Save the Application (client) ID and the Directory (tenant) ID for future use.
-
Configure Federated Credentials:
- Within the Sama's application page, select "Certificates & Secrets" from the menu and create new Federated credentials.
- Under the "Federated credential scenario" field, select "Other issuer".
- Input the following details:
- Issuer (provided by Sama): https://cognito-identity.amazonaws.com
- Subject identifier: eu-west-1:72a28b0b-b4cc-443f-9032-a397c1ef692a
- Audience: eu-west-1:e4639e61-9b32-4a7f-aeb9-9987f28d102d
-
Grant Access to Azure Storage:
- Go to your Azure Storage Container and/or Account, select "Access Control (IAM)" from the menu.
- Add a Role assignment for the registered Sama application with the following roles:
- Storage Blob Data Contributor (for the Azure Storage Container)*
- Storage Blob Delegator (for the Azure Storage Account)
-
Configure Resource sharing (CORS):
- Select Resource sharing (CORS) from the menu.
- In Blob Storage, set:
- “Allowed origins” to *
- app.sama.com
- “Allowed methods” to GET, PUT
- “Max Age” to 3000
- “Allowed origins” to *
-
Configure Sama Account:
- Navigate to your organization details page in https://accounts.sama.com.
- Fill in the "Integration Azure" section with your Azure Application client ID and tenant ID and save the values.
- Fill in the Azure Storage Account and Storage Container fields in which you want processed assets to be written to.*
-
Test Your Configuration:
- Validate your setup by using a URL to access an asset in your Azure Storage Container.
* Assets are processed(transformed) on the Sama Platform for compatibility. The provided steps store these processed assets in your Azure Storage for the Sama workforce. If you prefer using Sama's storage:
- In step 4, select Storage Blob Data Reader instead of Storage Blob Data Contributor.
- In step 6, do not fill the Azure Storage Account and Storage Container fields.
AWS S3
Assets are fetched from your cloud storage through cross-account access. Video and 3D assets are retained in Sama's S3.
Configure your S3 bucket policy as follows:
Replace <BUCKET_NAME>
with the name of the bucket Sama will need access to. This will give Sama read-only access to the entire contents of the bucket.
If more granular access is needed, the arn:aws:s3:::<BUCKET
resource can be replaced with a list of resources that include the paths to which Sama will be granted access, such as arn:aws:s3:::<BUCKET
and arn:aws:s3:::<BUCKET_NAME>/other/path/that/sama/needs/*
.
CORS configuration
If you want to keep your assets hosted on your cloud storage service, please ensure that your CORS configuration is properly set up:
The Origin will be https://app.sama.com
The Access-Control-Request-Method is GET
Here is a sample AWS S3 CORS bucket configuration that will enable the Sama platform to properly serve images.
Google Cloud Storage
Use native Identity and Access Management (IAM) roles and policies to provide Sama access to your data. The Google Cloud Storage integration supports images, videos, and 3D where Sama never retains a copy of your asset.
How Sama Authenticates with Your Google Cloud Storage Instance
- Sama obtains GCP credentials for its Service Account (sama-external@rd-prod-398911.iam.gserviceaccount.com).
- Sama impersonates your Service Account.
- Finally, Sama communicates with GCP Storage using a token generated for your service account (impersonated).
Read more about service account impersonation:
Steps to set up Google Cloud Storage with Sama
-
Setting Up GCP Cloud Storage
- Create a new GCP Cloud Storage Bucket or use an existing one.
- Upload a file to the Cloud Storage Bucket or use an existing file.
-
Creating a Service Account
- Navigate to IAM & Admin > Service Accounts and create a new service account for Sama.
- Save the service account's email for future reference.
-
Granting Roles to the Service Account:
- Assign the "Service Account Token Creator" role to the service account:
- Go to the service account page and select the Permissions tab.
- Click on “Grant Access.”
- Under "New principals," enter the service account's email.
- For the Role, choose “Service Account Token Creator.”
- Grant "Storage Object User"* (See footnote, if read-only) permissions to the Storage Bucket:
- Navigate to the Storage Bucket page and select the Permissions tab.
- Click on “Grant Access.”
- Under "New principals," input the service account's email.
- For the Role, choose “Storage Object User.”
- [Optional] For enhanced security, you can specify IAM conditions that target the buckets and path which should be accessible by the service account.
- For example, "Condition type Tag has key sama-bucket/read-write"
- Grant Sama "Service Account Token Creator" role on your service account:
- Go to the service account page and select the Permissions tab.
- Click on "Grant Access."
- Under "New principals," enter Sama's service account's email:
sama-external@rd-prod-398911.iam.gserviceaccount.com
- For the Role, choose “Service Account Token Creator.”
- Assign the "Service Account Token Creator" role to the service account:
-
Configure Your Sama Account:
- Navigate to your organization details page at https://accounts.sama.com.
- Complete the "Integration GCP" section with your GCP Service Account Email.
Service Account Email: <service-account>@<project-id>.iam.gserviceaccount.com
- Fill in the Bucket and Prefix fields in which you want processed assets to be written to.* (see footnote, if read-only)
-
Test Your Configuration:
- Validate your setup by using a URL to access an asset in your Cloud Storage Bucket. We support the following GCP URL types: https://cloud.google.com/storage/docs/request-endpoints.
https://BUCKET_NAME.storage.googleapis.com/OBJECT_NAME
https://storage.googleapis.com/BUCKET_NAME/OBJECT_NAME
- Validate your setup by using a URL to access an asset in your Cloud Storage Bucket. We support the following GCP URL types: https://cloud.google.com/storage/docs/request-endpoints.
* Assets may be processed(transformed) on the Sama Platform for compatibility. The provided steps above store these processed assets in your Google Cloud Storage for the Sama workforce. If you prefer using Sama's storage or no processing is needed:
- In step 3b, select Storage Object Viewer instead of Storage Object User.
- In step 5c, do not fill the Bucket and Prefix fields.
Pre-signed and Public URLs
You can also send pre-signed or public URLs of your assets as an alternative to IAM Delegated Access or Cross-Account Access.