I am preparing a new version of the algorithm we use for benchmarking APIs that we include in the Streamdata.io API performance gallery
. Historically we've calculated a percentage of savings for client bandwidth savings, server bandwidth savings, and server CPU savings for each API. But is API performance just about that? As I rework the benchmark tool, and establish a more automated set of APIs for managing the process, I'm taking a fresh look at the inputs and outputs necessary to measure and understand the APIs we are showcasing. I am keeping most of the variables in the refresh of our benchmark algorithm, but I'm looking to reboot it, automate it a little bit, and possibly give a new name--Streamrank.
This is just a draft of the algorithm, but I wanted to tell the story, in hopes of making sure it makes sense.
After I document and sign up for API access, I poll each API on a 60 second interval, for at least 24 hours (if possible).
- Download Size
- What is the API response download size when I poll an API?
- Download Time
- What is the API response download time when I poll an API?
- Did It Change
- Did the response change from the previous response when I poll an API?
I track this data for each API response, record results in a database, and then I take the average across the targeted time period. It seems like a good indicator of the API performance.
Next, I take the same API and I proxy it with Streamdata.io and run the stream for as along as I was polling--ideally, simultaneously with polling.
- Download Size
- What is the initial response, and incremental download size for each JSON patch in stream?
- Download Time
- what what the initial response, and incremental download time for each JSON patch in stream?
I track this data for each API response, record results in a database, and then I take the average across the targeted time period.
Once I have the base numbers for polling, and for streaming, I figure out the percentage difference between the average polling and streaming for size and time -the percentage of change is calculated based upon changes from polling only.
- What is the percentage difference between the average polling and streaming size of responses?
- What is the percentage difference between the average polling and streaming time of responses?
- What is the percentage of change for polling responses?
Once I have these numbers I translate them into my three main Streamrank scores, helping communicate the efficiencies that Streamdata.io brings to the table:
- Bandwidth Savings
- How much bandwidth can be saved?
- Processor Savings
- How much processor time can be saved?
- Real Time Savings
- How real time an API resource is, which amplifies the savings?
While real time is definitely a factor here, Streamrank is meant to demonstrate that Streamdata.io is more than just being about real time streaming of data, and can deliver some serious efficiency gains.
5. Details of Test
We are interested in being as transparent about Streamrank as we possibly can, which is why I'm telling this story, and we will be publishing other details of the test for each API:
- Date of Test
- When did we last run the test? I'm hoping we can do regularly.
- Duration of Test
- What was the overall duration of the benchmark testing?
- Polling Frequency
- How often are we polling? Default is 60 seconds.
The accuracy, and impact of the test varies from API to API. There are many different dimensions to how an API can be polled or streamed, and often times the tests hits their mark, while other times it can be pretty inconclusive. The goal isn't to be 100% right, it is just to help quantify how we can think differently about the resources we are serving up with our APIs.
I have this draft benchmark in place, and wrapped as an API. I can pass it an OpenAPI
for any API we want to test, and it will poll and stream for the designated time period, and calculate the results. As I run it against more of the APIs we are profiling I will continue to adjust and dial in, looking for other ways I can improve upon the algorithm. The goal is to be transparent with the algorithm, as well as the results, and encourage API providers and consumers to get involved, and help us improve the results to better understand API performance.
I'd also like to find ways of better articulating the results, such as applying the bandwidth and processor savings to AWS ,Azure, and Google cloud platform pricing--demonstrating how using Streamdata.io can pencil out financially. The process has already taught me a lot about how efficient, or inefficient APIs can be, and we are hoping that it will do the same with the API providers we are introducing to our services, and API consumers we are targeting with the Streamdata.io API Gallery