Inference is the process of using models to make predictions on new data, performance is measured based on throughput, latency, GPU memory usage, and GPU power usage. The models used are BERT and ResNet50. The right configuration can be used to maximise inference. Configuration analysis needs to be done to find out which configuration is right for model inference. The main challenge in the analysis process lies in its inherent time-intensive nature and inherent complexity, making it a task that is not simple. The analysis needs to be made easier by building an automation programme. The automation programme analyses the BERT model inference configuration by dividing 10 configurations namely bert-large_config_0 to bert-large_config_9, the result is that the right configuration is bert-large_config_2 resulting in a throughput of 12.8 infer/sec with a latency of 618 ms. While the ResNet50 model is divided into 5 configurations, namely resnet50_config_0 to resnet50_config_4, the result is that the right configuration is resnet50_config_1 which produces a throughput of 120.6 infer/sec with a latency of 60.9 ms. The automation programme has the benefit of facilitating the process of analysing the inference configuration.