Running AI Prototyping Projects

Now that we’ve covered what AI prototyping projects are and how to launch AI prototyping projects, let’s talk about how to get started with an AI prototyping project.

While artificial intelligence is a broad and nebulous term, I find that AI projects benefit from keeping things as simple as possible - especially at the beginning.

Low-fidelity Prototyping

One of the best ways of conveying the vision of an AI system is through low-fidelity prototypes. A low-fidelity prototype involves simulating the application experience through visual aids.

Low fidelity prototypes can include paper prototypes where you sketch the individual screens of your application and how your application responds to an input such as a chat message or uploaded photo.

Paper prototypes are easy to make and simple to understand. Additionally, because they’re literally sketched on paper, a paper prototype is not likely to be misinterpreted as a completed system. However, they can leave some ambiguity as far as styling goes.

A whiteboard sketch of a conversational AI system

More complex prototypes can be created using software like Figma or Balsamiq. These pieces of software allow business users to use drag and drop tooling to build functional user interface concepts. These prototypes can be arranged in a series of screens to illustrate sample interactions and how a system might behave. This can help you illustrate the vision of a software system to a developer team or to other members of the organization.

A more polished mockup of the same AI system

However, it’s important to note that while these low-fidelity prototypes illustrate your application’s flow and concept, they do not deal with the realities of artificial intelligence since they were created by human intelligence.

To evaluate what’s technically possible - and to discover the limitations of systems you may work with, you’ll need to go ahead and create a technical prototype, ideally by focusing on the simplest possible implementation first.

Simplest possible implementation

With an AI project, you may be tempted to train a custom model to solve a particular problem you’re facing, but many AI as a Service offerings can help you prototype your idea with passable performance to determine if your approach is viable.

For example, if I was building an AI system to use a camera and describe what it sees, I could work on training computer vision models to recognize specific objects using a large quantity of reference photos from different angles and different lighting conditions. Such an understanding would require a significant amount of time and effort.

Alternatively, I could use a pre-trained computer vision model, or a system wrapped around such a model like Azure AI Computer Vision. Using these pre-trained models, you can take advantage of models others have trained already to extract insight from your inputs. You can then run custom logic based on what you see in the image.

Azure Computer Vision's pre-trained model recognizing objects in an image

For example, if I were building an app for those with accessibility challenges, you could use the computer vision model to extract information about the image and mix it with a custom prompt telling the model to talk about trip hazards such as rugs, cables, or items on the ground.

Alternatively, with the advent of multi-modal models, you could use a model like GPT-4o and send it your image as well as a request to highlight trip hazards. Such a request could likely give you a superior result to the approach with computer vision on its own because the model would have additional context from your raw image.

Contrast this approach to training a custom image model. While that training project might ultimately provide better results to users, it would take a significant amount of time, effort, storage, and computing resources to build such a model - and the model might not be as effective in all lighting conditions or environments. These inaccuracies would require additional images and training time to overcome, resulting in potentially months before you had a viable model of your own to deploy.

A diagram of data being used to train a model which can interpret new images

Conversely, you could use a pre-built model with a custom prompt to evaluate the feasibility of your approach and a product that uses it before making the decision to train your own model.

Illustrating failure points

One of the things you’ll need to handle in a prototype is when the system fails.

Failure in AI systems can come from a variety of factors:

  • The system housing your model may be offline or inaccessible due to networking issues
  • You may encounter rate limits or intermittent error responses from external APIs
  • Your system may generate a response, but the response has low confidence in its correctness
  • You may deliver a response to the user and the user finds an issue with something your system said or did

Each of these scenarios represents a different way something could fail, and each of them requires a slightly different response.

In scenarios where you are relying on an external resource that is unresponsive, rate limited, or erroring, your system will need to gracefully handle this error and surface it to the user in a human-readable manner. For example, a rate limit error response in a chat application could be handled with a response indicating that the system is experiencing a high volume of traffic and the user should try again in a specific period of time (typically included in a rate limit error response).

If a system generates a response, but the response is low confidence, you can choose to filter out the response entirely or to indicate to the user that the system is unsure. For example, if you show a computer vision system a photo and the system is 80% certain that the photo contains a dog and 20% certain the photo contains a squirrel, you may want to remark only on the dog and omit information about low confidence objects from the results.

Alternatively, you could redirect low confidence scores to a queue for a human to process. For example, in a particular system Leading EDJE helped a client implement, if the confidence score was below a certain threshold, the item being processed was redirected to a queue for manual review by a human.

Finally, there may be cases where your system simply gets it wrong. In these scenarios you should make it clear to users how to report and resolve these issues. In conversational AI systems this might simply be giving a “thumbs down” feedback on a specific response. In other applications, it might involve appealing an automated decision a system made so that a human is in the loop and can make an appropriate decision based on the information available.

While it’s normal to only handle the “happy path” during prototyping, prototypes do break and they particularly like to break during demos. I recommend you at least add basic error handling to your application and have a plan on handling additional types of failures.

In our next article in this series, we'll talk more about these pieces of uncertainty and how we handle risks, uncertainty, and weak areas when developing AI prototypes.

Share on: linkedin copy