UX Design for Computer Vision and Check-out Free Retail

Designing a seamless user experience for efficient human annotation of AI datasets in retail stores.

This article detailed the technical aspects of the design of a video annotation app for check-out free retail experiences.

The app was designed to streamline the annotation and to support the customer journey in check-out free retail and includes features such as annotation history, review, and export to support collaborative annotation and data management.

Streamlining Data Annotation for Video

One of the main challenges in creating a video annotation app for check-out free retail is the need to streamline the annotation process to improve efficiency and accuracy. This is particularly challenging when dealing with large volumes of video data, as manual annotation can be time-consuming and prone to errors.

Additionally, there is a need to support collaboration and data management, as multiple users may be working on the same project or sharing annotation data between different systems.

Designing a video data annotation app that aims to solve the problem of obtaining high-quality annotations for AI datasets for videos, improve the performance of machine learning models, and support various applications.

Build It!

The video annotation app is designed to provide accurate and detailed annotation data to improve the performance of computer vision models and enable a seamless checkout-free experience.

The app will provide functionalities such as annotation history, annotation review, and annotation export to support collaborative annotation and data management.

Overview Page

A dataset annotation page overview and status should provide a clear and easy-to-use interface for tracking the progress and performance of the annotation process. The page should include a variety of charts, tables, and data points to give a comprehensive overview of the annotation progress and performance.

An easy-to-use user interface should include clear and simple navigation, intuitive design, and easy access to all the charts, tables, and data points. Additionally, the interface should be responsive and accessible from any device, such as a desktop, laptop, or tablet.

Progress bar: A progress bar that shows the overall completion rate of the annotation process.

Annotation performance charts: Charts that show the performance of the annotators, such as the number of images/frames annotated per hour, the number of rejected images/frames, and the average annotation time per image/frame.

Quality control statistics: A table that shows the percentage of images/frames that passed quality control, the percentage of images/frames that failed quality control, and the number of images/frames that were reviewed by the quality control team.

Annotator performance charts: Charts that show the performance of individual annotators, such as the number of images/frames annotated per hour, the number of rejected images/frames, and the average annotation time per image/frame.

Annotation categories breakdown: A pie chart that shows the breakdown of the annotation categories, such as the number of images/frames annotated for object detection, object tracking, and activity recognition.

Confidence score: A chart that shows the distribution of the confidence score for the annotations, this will help to identify if more annotation needs to be done for certain objects or certain frames.

Annotator feedback form: A form that allows annotators to provide feedback on the annotation process, such as suggestions for improving the annotation guidelines or issues encountered during the annotation process.

Annotation Page

The annotation page is the main page of the video annotation tool, where users will perform the annotation of the video data sets to recognize products in brick-and-mortar stores. The goal of this page is to make it extremely easy for humans to annotate video data sets and to recognize products, in order to make the cashless experience possible. Each user will be assigned to specific projects.

Video player: The video player will allow users to view the videos that they are annotating. It should be intuitive and easy to use, with controls for play, pause, and rewind. The player should also have a timeline that shows the current frame number and the total number of frames.

Annotation tools: The annotation tools should be simple and intuitive, and should include options for creating bounding boxes, points, lines, and polygons, as well as options for labeling and classifying objects and activities in the video. The tools should also include options for adjusting the size and position of the annotations.

Annotation History: The annotation history will allow users to view the annotations that they have created, as well as to edit or delete existing annotations. This will help users to keep track of their work and to make corrections if necessary.

Annotation review: The annotation review will allow users to review their annotations, and to make sure that they are accurate and complete. This will help users to identify any errors or missing annotations, and correct them before exporting the data.

Annotation export: The annotation export will allow users to export the annotation data in the desired format. The export should support a variety of formats, such as CSV, JSON, or XML.

Object and activity recognition: The annotation interface could include pre-configured object and activity recognition functionalities that could help users quickly and accurately recognize objects and activities in the video and reduce the time spent on annotation. This could include using pre-trained object detection models or using image recognition algorithms to automatically identify and classify objects in the video. This feature could also include the ability to add new objects or activities to the system for recognition, which would allow for easy integration with the store’s inventory management system.

Real-time feedback: The annotation interface could also include real-time feedback, such as accuracy scores, to help users to quickly identify and correct any errors in their annotations. This feature could also include suggestions for improving the accuracy of the annotations, such as providing tips on how to create more accurate bounding boxes or labels.

Metadata Automation

Creating an interface that leverages quick data annotations and reusability of existing metadata is crucial for brick-and-mortar stores that aim to provide a cashless shopping experience.

Preset metadata: The application can allow users to save specific metadata as a preset for reuse on one or more videos. This would save users the effort of manually entering the same information for different videos, and help to increase the efficiency of the annotation process.

Metadata buttons: The application can include buttons to the right of the metadata fields in the annotation interface that allow users to perform a variety of actions such as opening the folder in which the video appears, saving metadata to the file, resolving metadata conflicts, and more.

Synchronizing metadata: The application can allow users to synchronize specific metadata in selected videos with metadata in another video. This would provide a fast way to add information and IPTC metadata to videos and save users the effort of repeatedly typing the same metadata into videos.

Automated annotation: The application can use automated annotation tools like object detection, semantic segmentation, and OCR to provide users with a quick and accurate annotation process.

Additional Pages

Login/Registration page: This page will allow users to create an account and log in to the annotation tool. It will also include information about the project and the annotation process, as well as any guidelines or instructions that users should follow when annotating videos.

Video selection page: This page will allow users to select the videos that they would like to annotate. It will include a list of available videos, along with information such as the video length and the types of annotations that are required.

Review and export page: This page will allow users to review their annotations and export the annotation data in the desired format. It will also allow users to view and edit previously created annotations.

Design Process

Based on the feedback gathered during user testing, the design process would be iterative, meaning that the design would be refined and improved upon through multiple rounds of testing and feedback. This allows for a continuous improvement of the app’s design and ensures that it meets the users’ needs and addresses their pain points.

  1. Wireframes: The first stage of the design process would involve creating wireframes, which are simple, low-fidelity sketches of the app’s layout and functionality. Wireframes help to establish the basic structure and layout of the app and can be used to test and refine ideas before moving on to more detailed designs.
  2. Mockups: Once the wireframes have been refined, the next stage would involve creating mockups, which are more detailed and polished visual representations of the app’s design. Mockups are used to test and refine the visual design and can be used to get feedback on the overall look and feel of the app.
  3. Prototypes: The next stage of the design process would involve creating interactive prototypes, which are functional representations of the app. Prototypes allow users to interact with the app in a realistic way, and can be used to test and refine the app’s functionality and usability.
  4. User testing: Throughout the design process, user testing would be conducted to gather feedback on the wireframes, mockups, and prototypes. User testing can be done with in-person usability testing or remote user testing, and it’s essential to test with real users to validate the design decisions and improve the app’s usability.

Evaluation

Defining success in this project can be achieved by meeting the following metrics:

High user satisfaction with the annotation app can be quantified by measuring the percentage of users who rate the app as “very satisfied” or “extremely satisfied” on a user feedback survey. A target success value for this metric could be 80% or higher.

High accuracy and consistency in the annotations can be quantified by measuring the Inter-annotator agreement (IAA) or the F1-score of the annotations. Inter-annotator agreement (IAA) is a measure of how similar the annotations of two or more annotators are. F1-score is a measure of how well the annotated objects were predicted by the model. A target success value for IAA could be 80% or higher and for F1-score could be 90% or higher.

High accuracy and consistency in the annotations can be quantified by measuring the Inter-annotator agreement (IAA) or the F1-score of the annotations. Inter-annotator agreement (IAA) is a measure of how similar the annotations of two or more annotators are. F1-score is a measure of how well the annotated objects were predicted by the model. A target success value for IAA could be 80% or higher and for F1-score could be 90% or higher.

High efficiency and productivity in the annotation process can be quantified by measuring the speed and volume of annotations produced. This can be measured by the number of annotations produced per hour or per day. A target success value for this metric could be 100 annotations per hour or higher.

High scalability and robustness of the system can be quantified by measuring the system’s ability to handle a large volume of video data and handle different types of videos. This can be measured by the number of videos that can be processed per hour, the number of concurrent users that can use the system, and the number of different video formats that the system can support. A target success value for this metric could be the ability to process 1000 videos per hour or higher, support for 100 concurrent users, and support for at least 5 different video formats.

Technical Implementation and Deployment

Once the design has been finalized, the next step would be to implement the design in code. The technical implementation of the video annotation app would involve several steps, including:

  • Choosing the appropriate programming languages and libraries: The app would be built using a combination of programming languages and libraries, which are well-suited to the task of video annotation.
  • Designing the app’s architecture and infrastructure: The app would be designed with a scalable architecture to support the large volume of data and ensure that it can handle the demands of the annotation process. The app would be hosted on cloud infrastructure, to ensure that it can be accessed from anywhere and is highly available.
  • Implementing the app’s functionality: The app would be implemented to provide the functionality described in the design process, including the ability to annotate videos using different formats, such as bounding boxes, points, lines, and polygons, and functionalities such as annotation history, annotation review, and annotation export.
  • Testing and quality assurance: Before deploying the app, it would be thoroughly tested and quality assurance would be done to ensure that it is free of bugs and that it meets the requirements and standards.

Relevant Python Libraries

The computer vision libraries that would be relevant for this project would depend on the specific requirements and use case of the project. For example, if the project requires object detection, libraries such as TensorFlow Object Detection API or YOLO (You Only Look Once) could be used. For semantic segmentation, libraries such as DeepLab or U-Net could be used. For OCR, libraries such as Tesseract or Google Cloud Vision could be used.