The process of converting a collection of paper-based documents into a digital version can be a very daunting and time-consuming task. Data entry workers must gather paper documents, sort them, read them line by line, type necessary details into a computer, review their work for mistakes, and then deal with disposing of the original papers afterward to avoid further buildup.
Thankfully, there is no need for things to be so cumbersome in 2022! Using Azure Computer Vision, you can quickly analyze an image and extract text from it to create a digitized version. All your data entry workers need to do is just scan those documents as images and leave the rest to Azure Computer Vision.
In this tutorial, you'll learn how to build a paper digitization solution that works at scale using Azure Functions and Azure Computer Vision.
Azure Cognitive Services and Azure Computer Vision
If you've had any experience with machine learning/deep learning, you'll know that building any solution that performs any sort of cognitive operations, such as understanding text, detecting objects in an image, or transcribing audio, requires significant know-how.
This know-how is typically outside the developer's skill set and requires specialized machine learning engineers who know about different image and audio processing techniques.
With Azure Cognitive Services, however, there is no need to master all these complex skills. Azure encapsulates all cognitive operations into ready-made APIs that developers can use immediately without needing to understand the underlying machine learning/deep learning science.
One of the many services offered under Azure Cognitive Services is Azure Computer Vision. It enables a wide range of functionalities, such as:
- Optical character recognition (OCR): used to extract printed and written text from images and documents.
- Image understanding: used to extract a wide variety of visual features from an image.
- Spatial analysis: used to understand people's movement in space in real-time.
Optical Character Recognition
Optical character recognition, which you'll focus on heavily in this article, is the process of identifying hand-written or printed text from images to computers. There is a wide range of applications for OCR, such as:
- Vehicle plate recognition.
- Digitizing books and unstructured documents.
- Hand-written signature verification in banking systems.
You'll use the OCR service from the Azure Computer Vision resource when you work through the tutorial below.
Traditionally, deploying a software application meant maintaining a full stack of infrastructure resources. This stack spans everything from networking, storage, and servers up to the application itself. As you can imagine, managing this complex stack of components consumes valuable time and resources from businesses.
Cloud computing, however, aims to reduce the amount of responsibility a business owns in its application by placing responsibility for particular infrastructure pieces on the vendor instead of the business. One example of this is serverless functions.
By using serverless functions, developers can quickly develop particular pieces of business logic without worrying about any underlying infrastructure stack. Serverless functions make development faster and deployments quicker since developers own only a small part of the application responsibility. The rest is left to the cloud vendor. Azure Functions, or Functions Apps, are Microsoft's implementation of serverless computing.
In this tutorial, you'll use Azure Functions in the solution you'll build soon as a communication mechanism to receive HTTP requests containing images that the OCR service will analyze and process.
The solution you’ll be implementing in this tutorial consists of three main elements:
- An Azure Function triggered through an HTTP request (REST API).
- An Azure Cognitive Service that performs the computer vision operations.
- A client who consumes the Azure Function REST API.
Here’s a visual of the architecture for the project:
As demonstrated in the image above, the solution you'll be building works as follows:
- A client sends an image with the text that he wants to recognize using OCR to Azure Function using the HTTP endpoint.
- The Azure Function sends the image to Azure Cognitive Services (in particular, the Computer Vision service) to recognize the text using the OCR capability.
- The Azure Cognitive Service returns the recognition result to Azure Function.
- Azure Function parses the returned result from Azure Cognitive Service and returns a JSON response with the response lines to the client.
Implementing Scalable OCR with Azure
This section provides step-by-step instructions to create your Azure Cognitive Services resource and Azure Function, develop the OCR recognition code, deploy your solution to Microsoft Azure, and test it using an API client.
Before diving in, here are a few prerequisites you'll need to get started:
- Visual Studio Community Edition 2022. This is where you'll develop your Azure Function. (Note: If you are using MacOS, you may use Visual Studio for Mac. The tutorial steps may differ a bit, as noted below.)
- Active Azure subscription. This is where you'll deploy your Azure function and Cognitive services resources. The free trial should suffice for this tutorial.
- Postman. This is the HTTP client you'll use to test your solution.
Step 1: Create Your Azure Cognitive Service
In Microsoft Azure, all cognitive services are accessible from the Cognitive Services resource, which is hosted as an API endpoint accessible by using a specific key.
In the Azure dashboard, click Create a resource. In the search bar, type "Cognitive Services." You'll get information about the cognitive services resource and a legal notice. Click Create.
You'll need to specify the following details about the cognitive service (refer to the image below for a completed example of this page):
- Subscription: choose your paid or trial subscription, depending on how you created your Azure account.
- Resource group: click create new to create a new resource group or choose an existing one. For this tutorial, I'll create a new one and call it "ocr-rg."
- Region: choose the Azure region for your cognitive service. For best performance, choose the closest region to your geographical area. I'll select "West Europe."
- Name: choose a name for your cognitive service. I'll put "ocr-cognitive-service." This name should be unique across Azure.
- Pricing Tier: choose "Standard S0" as the pricing tier, or you can choose "Free F0" for the free tier.
Then click Review + create. You'll get a screen with your choices for validation, as shown in the image below. Make sure the validation passes and then click Create.
Azure will take a few seconds to create the resource. After you see "Your deployment is complete," click Go to resource. Then click on Keys and Endpoint on the left. Here, you'll need to take note of Key 1 or Key 2 (either is fine, just be sure to keep them secure) and the endpoint URL:
Step 2: Create a Function App
Next, in your Azure dashboard, click Create a resource once more. This time, search for "Function App" in the search bar. You'll get some information about the function app resource and a legal notice. Click Create.
Here, you'll need to specify the following details (refer to image below for completed version):
- Subscription: choose the subscription where you created the resource group "ocr-rg."
- Resource Group: select "ocr-rg."
- Name: choose a name for your function app. I'll choose "ocr-function-app-0." This name should be unique across Azure.
- Publish as: choose "Code."
- Runtime stack: choose ".NET."
- Version: choose "6." This is the latest .NET version, as of the time of writing the article.
- Region: for best performance, choose the closest region to your geographical area. I'll select "West Europe."
Then, click Review + create. You'll see a screen with your choices for your review, as shown in the image below. Make sure the configurations are correct, and then click Create.
Again, Azure will take a few seconds to create the resource. After it's finished, you'll see a message reading, "Your deployment is complete."
Step 3: Create a Function App Solution in Visual Studio
Now, you'll create a Function App in Visual Studio.
Launch Visual Studio 2022 and click on Create a new project. In the search box, type "Azure Functions." Choose the "Azure Functions Project" and then hit Next:
Give your project a name such as "OCRSolution" and then click Create.
Now, you'll need to choose a trigger to execute your Azure function. Since you want your Azure function to trigger whenever an image is sent to it using an API, you'll select "HTTP Trigger."
Then, change the authorization level on the left to "Anonymous" to keep things simple.
Note: do not do this in a production solution, since you'll need to authorize your consumers before using the function.
Make sure your settings match the ones pictured here:
Click Create. In a few seconds, Visual Studio will generate your Azure Function.
Step 4: Develop the OCR Function App
Next, rename the auto-generated function in the project to "OCRFunction" (and remember to rename the function class file as well). Also, delete everything in the OCRFunction.cs file (make it a blank file).
You'll need to add a NuGet package to use the cognitive services Computer Vision client from your computer. To do so, follow these steps:
- Right-click on the Solution Explorer (this may differ in Visual Studio for Mac)
- Select Manage NuGet Packages for Solution (this may differ in Visual Studio for Mac)
- Switch to the Browse tab
- Type `Microsoft.Azure.CognitiveServices.Vision.ComputerVision`
- Select the NuGet package from the list
- Select your project to install your package, "OCRSolution"
- Choose version "7.0.1"
- Click Install
Then, you'll add the following namespaces in the OCRFunction.cs file:
These namespaces include ordinary C# namespaces as well as the `Microsoft.Azure.CognitiveServices.Vision.ComputerVision` namespace for Azure Computer Vision.
In the OCRFunction class, you'll define variables for your cognitive service `endpoint` and `key`, which you noted earlier when you created the service in the Azure portal (refer to “Step 1: Create Your Azure Cognitive Service” for reference). You'll also define an instance of `ComputerVisionClient`, representing the Computer Vision client in C#:
Next, you'll define a private function and call it `Authenticate`. This function simply takes an instance of `ComputerVisionClient` and ties it to a key and endpoint.
Then, you'll define another function and name it `ReadImage`. This function takes a `ComputerVisionClient` instance and `IFormFile` (standard .NET file format in HTTP requests) and does the following:
- Creates a Stream from the IFormFile.
- Uses the Computer Vision client to call the `RecognizePrintedTextInStreamAsync` function, which calls the Azure Cognitive Service to perform the OCR operation. (Note: there are many other functions under `ComputerVisionClient` that you can explore yourself in Microsoft's docs).
- The `RecognizePrintedTextInStreamAsync` returns `ocrResult`, representing the result of the OCR operation from the image.
- The `ocrResult` object consists of multiple regions in a picture. For each region, there are multiple lines and, in each line, there are multiple words.
- The function calls another function (which you'll define soon) called `ExtractWordsFromOcrResult`. This function converts the `ocrResult` objects to a list of strings. Each string represents a line.
Next, you'll define a function called `ExtractWordsFromOcrResult`, which simply takes an `ocrResult` and iterates the region to extract each line. Then, from each line, it extracts each word. Words are concatenated together to form lines, and those lines are added to a `List<string>` object to represent the lines in a region.
Note: for the sake of simplicity, this tutorial assumes that the image contains only one region.
Finally, you'll develop the actual Azure Function. The function does the following:
- Receives the images through an HTTP Post request.
- Parses then converts the HTTP request to a .NET IFormFile and uses the `ReadImage` function you defined earlier to perform the OCR operation. (Note: the function performs this for every file in the HTTP request, which enables batch operations, and returns the result to the requesting client.)
- Returns back the JSON response containing OCR objects to the client.
Step 5: Deploy the OCR Function App to Azure
After you've developed the OCR Function, it is now time to deploy it to Microsoft Azure.
To start, right-click on the OCRSolution project in the Solution Explorer:
Click Publish to launch a publication wizard. Choose your publish target as Azure since you are publishing to Microsoft Azure. Then, click Next. Here, choose whether you want your Azure Function to run on Windows or Linux. Select "Windows" and click Next. Note: in other scenarios, you may choose Windows or Linux, depending on whether your code has platform-specific functionalities; however, it does not matter in this case.
Next, you'll choose the subscription, resource group, and the name of the Azure function you created in Microsoft Azure earlier (Note: make sure to log in using the same Microsoft account you used in the Microsoft Azure portal). This is where Visual Studio will deploy your Azure function. Then, click Finish:
After a short delay, you'll get a "Ready to publish" message, indicating that the wizard successfully created a publish profile for your Azure Function. Confirm the details and then hit Publish. Note: using the "Site" link provided at the bottom of the screen, you can double-check your publish configuration and the deployment URL. You'll want to make a note of the URL now, as you will use it again later.
Soon, you'll get an output in the Output window in Visual Studio to indicate that the Azure function was published successfully to Azure:
Step 6: Test the OCR Function App Using Postman
Finally, you'll test your OCR Function using the Postman client you downloaded earlier. Follow these steps to do so:
- Launch Postman and click +New Collection. Name your collection "OCRFunction."
- Under that collection, click Add requests and name your request "OCRFunction" as well.
- Click on the request on the left and change the request type to "POST."
- Paste your API URL, which you noted during deployment, appended with "/api/OCRFunction." For example: `https://ocr-function-app-0.azurewebsites.net/api/OCRFunction`.
- Click on the Body tab.
- Select "form-data" as your body type. This enables you to attach files in your HTTP request.
- Add a key named "File" and make sure the type is "File" as well.
- In the "VALUE" field, you can select one or multiple files. In this scenario, each file corresponds to an image. You'll upload three now, as found in the URLs below. Go ahead and download these images, inspect them, and upload them in the values field:
Finally, click Send to test your Azure function. Your OCR result should look like this:
You can find the final solution, along with the test images, in this GitHub repo. You can also find the full source code in the Appendix at the end of this article for easy reference.
Note: Once you’re finished, you’ll want to clean up the Azure resources you created to ensure in order to avoid unnecessary charges. To do so, complete the following steps:
- Go to the Microsoft Azure Portal.
- In the top search bar, type “ocr-rg” (or the name of the resource group you used, if you used a different name).
- Click on the resource group that pops up in the search result.
- Click on “Delete resource group.”
- You’ll get a prompt to type the resource group name to confirm; type “ocr-rg” (or the name of the resource group you used, if you used a different name).
- Click “Delete,” and in a few seconds, Microsoft Azure will delete the resources
In this tutorial, you learned about Azure Cognitive Services, Azure Computer Vision, and Optical Character Recognition. You also learned about Azure Functions and how to use them to build serverless applications quickly and easily.
Next, you learned the architecture of the solution you built during the tutorial, which consists of an Azure Function API that receives an image, sends it to Azure Cognitive Services for OCR recognition, and returns the JSON response to the client.
Finally, you learned how to create the solution components in Microsoft Azure and how to develop, deploy, and test Azure Functions.
It is worth mentioning that Azure Computer Vision has a rich SDK that can perform many operations. To learn more about it, visit Microsoft MSDN.
Since you are working with digital documents, an e-signature tool is a handy solution to easily and quickly sign your documents. Be sure to check out HelloSign to take your signatures to the next level!