Workflows

OCR Block

Identify and locate specific text on a webpage using OCR.

The OCR (Optical Character Recognition) Block is designed to identify and locate specific text on a webpage. It scans the page and returns the coordinates of the text, which can then be used for actions like clicking or highlighting. This block is particularly useful for interacting with dynamic or non-standard web content, where traditional methods of locating elements might not work effectively.

OCR Block


Functionality

The OCR Block scans the webpage and detects any text present on the screen. By using advanced Optical Character Recognition (OCR) techniques, it can identify text even within images or non-standard elements. Once the target text is found, the block returns its exact coordinates (X, Y), allowing you to perform precise actions on or near the detected text.

You can access the coordinates returned by the OCR Block using the following syntax:

  • X coordinate: {{$state.state_key.coordinate[0]}}
  • Y coordinate: {{$state.state_key.coordinate[1]}}

This is especially useful for automating interactions with dynamic content, such as buttons or labels that are generated dynamically and cannot be accessed through typical HTML or CSS selectors.


Key Features

  • Text Recognition: The OCR Block uses Optical Character Recognition to detect text, even from images or non-standard formats, allowing for flexible and powerful interaction with web content.
  • Precise Coordinates: Once the text is identified, the block returns the exact coordinates, enabling precise interactions such as clicking, highlighting, or scrolling to the location.
  • Dynamic Content Interaction: Perfect for webpages that generate text dynamically or where traditional DOM-based interactions are challenging, the OCR Block helps automate interactions seamlessly.

Use Cases

  • Interacting with Dynamic Text: Automate clicks or highlights based on specific text found dynamically on a webpage, such as headlines, labels, or dynamically generated content.
  • Text Extraction from Images: Use the block to detect and extract text that is embedded within images, which is especially useful for OCR-based data extraction tasks.
  • Form Filling Automation: Detect and interact with text on forms that may not be accessible through conventional selectors, ensuring automation even on complex web interfaces.

Why Use the OCR Block?

The OCR Block is ideal for scenarios where interacting with text is essential, but traditional methods may fall short—especially in cases involving dynamically generated text, embedded text within images, or non-standard web elements. By utilizing OCR technology, this block ensures that you can interact with any visible text on a webpage, enhancing the accuracy and versatility of your automation workflows.


With the OCR Block, you can efficiently identify and interact with text on a webpage, making your automation workflows more adaptable and powerful in handling dynamic or visually complex content.