ML Hosting

Machine Learning Hosting Open Source vs Commercial

We need to explore existing solutions in the following areas:

ML model hosting solutions
- Supported model formats
  - MLflow provides several standard flavors that might be useful in your applications. Specifically, many of its deployment tools support these flavors, so you can export your own model in one of these flavors to benefit from all these tools: (https://www.mlflow.org/docs/latest/models.html#models)
    - Python Function (python_function)
    - R Function (crate)
    - H2O (h2o)
    - Keras (keras)
    - MLeap (mleap)
    - PyTorch (pytorch)
    - Scikit-learn (sklearn)
    - Spark MLlib (spark)
    - TensorFlow (tensorflow)
    - ONNX (onnx)
    - MXNet Gluon (gluon)
    - XGBoost (xgboost)
    - LightGBM (lightgbm)
  - Kubeflow supports two model serving systems that allow multi-framework model serving: KFServing and Seldon Core. Alternatively, you can use a standalone model serving system. This page gives an overview of the options, so that you can choose the framework that best supports your model serving requirements.

(https://www.kubeflow.org/docs/components/serving/overview/)

KFServing and Seldon Core share some technical features, including explainability (using Seldon Alibi Explain) and payload logging, as well as other areas.
A commercial product, Seldon Deploy, supports both KFServing and Seldon in production.
KFServing is part of the Kubeflow project ecosystem. Seldon Core is an external project supported within Kubeflow.

Feature	Sub-feature	KFServing	Seldon Core
Framework	TensorFlow	✓ sample	✓ docs
	XGBoost	✓ sample	✓ docs
	scikit-learn	✓ sample	✓ docs
	NVIDIA TensorRT Inference Server	✓ sample	✓ docs
	ONNX	✓ sample	✓ docs
	PyTorch	✓ sample	✓
Graph	Transformers	✓ sample	✓ docs
	Combiners	Roadmap	✓ sample
	Routers including MAB	Roadmap	✓ docs
Analytics	Explanations	✓ sample	✓ docs
Scaling	Knative	✓ sample
	GPU AutoScaling	✓ sample
	HPA	✓	✓ docs
Custom	Container	✓ sample	✓ docs
	Language Wrappers		✓ Python, Java, R
	Multi-Container		✓ docs
Rollout	Canary	✓ sample	✓ docs
	Shadow		✓
Istio		✓	✓

Automated updates/re-training
- During the model development part hyperparameters are often hard to tune. Tuning hyperparameters is critical for model performance and accuracy. Manually configuring hyperparameters is time consuming. Kubeflow’s hyperparameter tuner (Katib) provides an automated way to match your objectives. This automation can save days of model testing compute time (freeing up valuable infrastructure) and speed the delivery of improved models.
- MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box,” without even having to know which library you are using.
Using the hosted model
- Experimentation with training an ML model
  - Rapid experimentation is critical to building high quality machine learning models quickly. Kubeflow offers a user-friendly interface (UI) that allows you to track and compare experiments. You can decide later on which experiment was the best and use it as a main source for your future steps. On top of that Kubeflow 1.0 provides stable software sub-systems for model training including Jupyter notebooks, popular ML training operators such as Tensorflow and Pytorch that run efficiently and securely in Kubernetes isolated namespaces. The ML training operators simplify configuration and operations of scaling ML training tasks. In addition, Kubeflow has delivered Critical User Journeys(CUJs), such as the build, train and deploy, which provide end-to-end workflows that speed development. You can read more about the CUJs in the Kubeflow roadmap.
  - The development of ML models can require hybrid and multi-cloud portability and secure sharing between teams, clusters and clouds. Kubeflow is supported by all major cloud providers and available for on-premises installation. If you need to develop on your laptop, train with GPU on your on-prem cluster and serve in the cloud, Kubeflow provides the portability to support fast experimentation, rapid training and robust deployment in the same or different environments with minimal operational overhead
  - MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help you deploy them. Each Model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a TensorFlow model can be loaded as a TensorFlow DAG, or as a Python function to apply to input data. MLflow provides tools to deploy many common model types to diverse platforms: for example, any model supporting the “Python function” flavor can be deployed to a Docker-based REST server, to cloud platforms such as Azure ML and AWS SageMaker, and as a user-defined function in Apache Spark for batch and streaming inference. If you output MLflow Models using the Tracking API, MLflow also automatically remembers which Project and run they came from.
dev/production and CI features
- Kubeflow currently doesn’t have a dedicated tool for this purpose. But our users have been using the Pipelines component and it worked really well for them. Kubeflow Pipelines can be used to create reproducible workflows. These workflows automate the steps needed to build a ML workflow, which delivers consistency, saves iteration time, and helps in debugging, auditability and compliance requirements
- Machine learning requires experimenting with a wide range of datasets, data preparation steps, and algorithms to build a model that maximizes some target metric. Once you have built a model, you also need to deploy it to a production system, monitor its performance, and continuously retrain it on new data and compare with alternative models.
- Moreover, although individual ML libraries provide solutions to some of these problems (for example, model serving), to get the best result you usually want to try multiple ML libraries. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box,” without even having to know which library you are using.
- In addition to continuous experimentation, components like MLFlow allow the tracking and storage of metrics, parameters, and artifacts, which are not only critical to enabling that continuous experimental loop, but also support responsible and sustainable systems from a governance perspective.
Consider the following
- Entire platforms vs smaller tools (one for hosting, one for training, etc) that need to be brought together
  - Easy, repeatable, portable deployments on a diverse infrastructure (for example, experimenting on a laptop, then moving to an on-premises cluster or to the cloud)
  - MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code or a Git repository, and uses a descriptor file or simply convention to specify its dependencies and how to run the code. For example, projects can contain a conda.yaml file for specifying a Python Conda environment. When you use the MLflow Tracking API in a Project, MLflow automatically remembers the project version (for example, Git commit) and any parameters. You can easily run existing MLflow Projects from GitHub or your own Git repository, and chain them into multi-step workflows.
  - MLflow is a single python package that covers some key steps in model management. Kubeflow is a combination of open-source libraries that depends on a Kubernetes cluster to provide a computing environment for ML model development and production tools.
- Commercial vs open source
  - KubeFlow = Open Source (https://www.kubeflow.org/docs/about/kubeflow/)
  - MLFlow= Open Source (https://www.mlflow.org/docs/latest/index.html#)
- Usage Costs
  - Since both MLFlow and KubeFlow is open source, there is no price range or price to using the software, you will only be able to quantify cost once you start using the products.
  - Kubeflow, AI Hub, and notebooks can be used for no charge. You can learn about the pricing of our managed services like AI Platform Training, AI Platform Predictions, Compute Engine, Google Kubernetes Engine, BigQuery, and Cloud Storage here. You can also use our pricing calculator to estimate the costs of running your workloads.
  - When considering Azure pricing, users need to keep in mind that the costs will depend on the types of products the development team needs. The hourly server cost can range from \$0.099 per hour to \$0.149 per hour. Of course, if you measure the costs by per instance, they might not seem consistent. However, the prices are comparable to AWS and GCP when you factor in the price per GB of RAM. As the main enterprise cloud service providers compete for your business, the prices remain competitive across the board.
  - Google Cloud follows a to-the-minute pricing process. While GCP may fall behind in additional features, it compensates in cost efficiency. The platform also has pay-as-you go pricing, billing to the per second of usage. Setting GCP apart, it offers discounts for long-term usage that starts after the first month.
  - Many experts recommend that enterprises evaluate their public cloud needs on a case-by-case basis and match specific applications and workloads with the vendor that offers the best fit for their needs. Each of the leading vendors has strengths and weaknesses that make them a good choice for specific projects.
- Compatibility with our current stack
  - KubeFlow works with GCP (https://www.kubeflow.org/docs/gke/deploy/project-setup/)
  - MLflow can be configured to work on GCP (https://medium.com/weareservian/deploying-an-ml-model-using-gcp-and-mlflow-27084989f98)
- Degree of complexity
  - KubeFlow
    - You need some knowledge of the following systems and tools:
      - Kubernetes
      - Kustomize
        
        Kustomize lets you customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched and usable as is.
        
        kustomize targets kubernetes; it understands and can patch kubernetes style API objects. It's like make, in that what it does is declared in a file, and it's like sed, in that it emits edited text.
    - If you plan to deploy Kubeflow on an existing Kubernetes cluster, review these Kubernetes system requirements
- The effort required to maintain/degree of automation?????

The Kubeflow mission

Our goal is to make scaling machine learning (ML) models and deploying them to production as simple as possible, by letting Kubernetes do what it’s great at:

Deploying and managing loosely coupled microservices
Scaling based on demand

Because ML practitioners use a diverse set of tools, one of the key goals is to customize the stack based on user requirements (within reason) and let the system take care of the “boring stuff”. While we have started with a narrow set of technologies, we are working with many different projects to include additional tooling.

Ultimately, we want to have a set of simple manifests that give you an easy to use ML stack anywhere Kubernetes is already running, and that can self-configure based on the cluster it deploys into.

Minimum system requirements for Existing Kubernetes environment

The Kubernetes cluster must meet the following minimum requirements:

Your cluster must include at least one worker node with a minimum of:
- 4 CPU
- 50 GB storage
- 12 GB memory
The recommended Kubernetes version is 1.14. Kubeflow has been validated and tested on Kubernetes 1.14.
- Your cluster must run at least Kubernetes version 1.11.
- Kubeflow does not work on Kubernetes 1.16.
- Older versions of Kubernetes may not be compatible with the latest Kubeflow versions. The following matrix provides information about compatibility between Kubeflow and Kubernetes versions.

Kubernetes Versions	Kubeflow 0.4	Kubeflow 0.5	Kubeflow 0.6	Kubeflow 0.7	Kubeflow 1.0
1.11	compatible	compatible	incompatible	incompatible	incompatible
1.12	compatible	compatible	incompatible	incompatible	incompatible
1.13	compatible	compatible	incompatible	incompatible	incompatible
1.14	compatible	compatible	compatible	compatible	compatible
1.15	incompatible	compatible	compatible	compatible	compatible
1.16	incompatible	incompatible	incompatible	incompatible	incompatible

Could not find why the latest kubeflow is not compatible with the latest Kubernetes, I went on kubeflow repo issue tracking and found nothing explaining why it is incompatible

KubeFlow Cloud Installation

Instructions for installing Kubeflow on a public cloud (IAAS considerations)

AWS for Kubeflow

Get Kubeflow running on Amazon Web Services (AWS)

Azure for Kubeflow

Get Kubeflow running on Microsoft Azure

Google Cloud for Kubeflow

Get Kubeflow running on Google Cloud Platform (GCP)

IBM Cloud for Kubeflow

Get Kubeflow running on IBM Cloud Kubernetes Service (IKS)

https://www.kubeflow.org/docs/started/kubeflow-overview/

https://www.analyticsvidhya.com/blog/2018/06/mlflow-an-open-source-machine-learning-platform-that-works-with-any-library-algorithm-and-tool/