The input metadata offers essential information about the video, comprising a sequence of images that create the motion picture, along with its size, height, and width.
{
"data":{
"sensor-location":"https://assets.samasource.org/...",
"right_camera Camera initial rotation":"[0,0,0]",
"left_camera Camera initial rotation":"[0,0,0]",
"center_camera Camera initial rotation":"[0,0,0]"
}
}
Answers
To understand the 'answer' in this JSON, two concepts are key: 'workspace output' and 'shape output'.
Workspace Output: This is the collective data generated from a digital workspace, encompassing all activities, elements, and their results within that space.
Shape Outputs: These are detailed outputs related to graphical 'shapes' within the workspace, like cuboids, each with specific data such as size and color.
In this context, the "answers" element in the JSON acts as a repository that holds both the overall workspace output and the specific data related to individual shapes. This information is organized in a dictionary format using key-value pairs, making it easier to access and manage the diverse range of outputs generated in the workspace.
A fused annotation output refers to an integrated annotation data structure, which combines information from multiple sources or viewpoints, such as different camera angles (left, right, center cameras in this case). This fusion enhances the accuracy and completeness of the annotation.
"answers" Object: This is the main container holding all annotation data.
Camera Sections ("left_camera", "right_camera", "center_camera"): Each section contains an array of objects representing annotations from different camera perspectives. These sections allow annotations to be organized based on the source of the data (camera view).
"shapes" array: Within each camera section, there's an array of "shapes", which holds the details of each annotated object. Each shape object includes "tags" (like "object_class": "vehicle"), "type" (e.g., "rectangle"), and an "index" for identification.
"key_locations" and "locations" arrays:
These arrays contain detailed annotation data for each shape, including points defining the shape's geometry, visibility status, frame number, and additional tags like "occlusion" and "truncation".
"key_locations" A key location signifies a shape in a frame that has been edited, including updates to its tags or adjustments to the shape itself. It acts as a marker for important changes or alterations made to the annotated shape in that particular frame.
"group_type" and "frame_count": These fields offer details on the group responsible for the annotation of this task and the total count of frames involved in the annotation process.
A rectangle is a four-sided shape where each side is at a right angle to the adjacent sides. Is defined by four coordinates, arranged as [[x1,y1],[x1,y2],[x2,y1],[x2,y2]], where (x1,y1) and (x2,y2) are diagonal from each other. This structure ensures that opposite sides of the rectangle are parallel and equal in length, and all angles are right angles. In this JSON he rectangle is represented by the coordinates (2869,7) (3263,7), (2869,674) and (3263,674) These points correspond to the four corners of the rectangle, ensuring that opposite sides are equal and parallel, forming the right angles at each corner.
The "index" refers to a unique identifier assigned to each shape, used to distinguish it from others in a sequence. It is a sequential number, meaning each shape is given a consecutive number based on its order in the sequence
{
"index":1
}
Tag
Tags are descriptive attributes assigned to a shape to provide additional information or classification, depending on how the project is set up tags can be in different formats:
1. Multi-Level Menu: This tag uses a hierarchical format, like "category |sub category" indicating categories and subcategories. 2. Dropdown: A dropdown format, offers a selection from predefined options. 3. Radio Button: This format, allows for choosing one option from a set. 4. Checkbox: A nested structure with binary choices (0 or 1) to indicate features like "left," "none," or "right" roadside.
These varied tag formats provide detailed and specific information about the object, facilitating nuanced classification and understanding.
{
"tags":{
"object_class":"vehicle"
}
}
Key Locations
Shape output
In the 3D world, shapes are made on specific objects within a scene, creating a list of 'shape outputs.' These are known as 'nested outputs' due to their hierarchical organization. They can be part of a larger workspace output or appear as nested objects in a JSON file
Tags
Tags are descriptive attributes assigned to a shape to provide additional information or classification, depending on how the project is set up tags can be in different formats:
1. Multi-Level Menu: This tag uses a hierarchical format, like "category|sub category," indicating categories and subcategories. 2. Dropdown: A dropdown format, offers a selection from predefined options. 3. Radio Button: This format, allows for choosing one option from a set. 4. Checkbox: A nested structure with binary choices (0 or 1) These varied tag formats provide detailed and specific information about the object, facilitating nuanced classification and understanding.
A rectangle is a four-sided shape where each side is at a right angle to the adjacent sides. Is defined by four coordinates, arranged as [[x1,y1],[x1,y2],[x2,y1],[x2,y2]], where (x1,y1) and (x2,y2) are diagonal from each other. This structure ensures that opposite sides of the rectangle are parallel and equal in length, and all angles are right angles. In this JSON the rectangle is represented by the coordinates (2869,7) (3263,7), (2869,674) and (3263,674) These points correspond to the four corners of the rectangle, ensuring that opposite sides are equal and parallel, forming the right angles at each corner.
The 'visibility' element indicates the visibility of a shape in a given frame. A shape might be visible in one frame but obscured by another object in the next, rendering it invisible. However, it can reappear in subsequent frames. This element helps track the shape's visibility across different frames.
Visible
{
"visibility": 1,
}
Not visible
{
"visibility": 0,
}
Frame number
The 'frame_number' in a 'key_frame' represents its position within the overall sequence of frames
{
"frame_number": 0
}
Locations
Locations indicate the frames that are interpolated between two key frames. If a shape isn't visible in a frame, its visibility value is set to 0, and there won't be any associated points array for that frame. That's way some locations are an empty array Inside a location you find all shape outputs: visibility, points, and tags