Custom Operator (don't know anything)
Controller Runtime: The Kubernetes controller-runtime Project is a set of go libraries for building Controllers Operator SDK: The Operator SDK is a framework that uses the controller-runtime library to make writing operators easierhttps://sdk.operatorframework.io/docs/building-operators/ansible/tutorial/
https://itnext.io/a-practical-kubernetes-operator-using-ansible-an-example-d3a9d3674d5b
https://two-oes.medium.com/building-custom-ansible-based-operator-for-openshift-4-ec681fa0466d
Example:
operator-sdk init --domain operator.redhatgov.io --plugins ansible
operator-sdk create api --group workshops --version v1alpha1 --kind Workshop --generate-role
It will create some directory structure(defaults file will be available)
Creating CRD:
kubectl create -f config/crd/bases/workshops.operator.redhatgov.io_workshops.yaml
Run custom Operator:
command: make deploy
create custom resource now
-----
apiVersion: workshops.operator.redhatgov.io/v1alpha1
#workshops.operator.redhatgov.io/v1alpha1
kind: Workshop
metadata:
name: example-workshop
spec:
# Add fields here
cr_my_replicas: 1
kubectl create -f config/crd/bases/workshops_v1_workshop_cr.yaml
you can check logs of custom - operator
kubectl logs -f workshop-controller-manager-9f6ff675b-rbcqw -n workshop-system
In RBAC role.yaml
I updated below
### added by prakash ###
- apiGroups:
- "*"
resources:
- "*"
verbs:
- "*"
GCP Data Life Cycle
Data lifecycle
Mainly data life cycle has 4 steps:
1. Ingest - ( to pull in the raw data )
Ingest
Store
Cloud Storage:
* backing up and archiving* storage and delivery of content
* For Hadoop and Spark jobs, data from Cloud Storage can be natively accessed by using Dataproc.
* BigQuery natively supports importing CSV, JSON, and Avro files from a specified Cloud Storage bucket.
Cloud Storage for Firebase:
* good fit for storing and retrieving assets such as images, audio, video, and other user-generated content in mobile and web apps.
Cloud SQL:
* fully managed, cloud-native RDBMS that offers both MySQL and PostgreSQL engines with built-in support for replication.
* offers built-in backup and restoration, high availability, and read replicas.
* Cloud SQL supports RDBMS workloads up to 30 TB for both MySQL and PostgreSQL* Data stored in Cloud SQL is encrypted both in transit and at rest
* For OLTP Cloud SQL is appropriate
* For OLAP workloads, consider BigQuery
* If your workload requires dynamic schemas, consider Datastore.
* You can use Dataflow or Dataproc to create ETL jobs that pull data from Cloud SQL and insert it into other storage systems.
* offers built-in backup and restoration, high availability, and read replicas.
* Cloud SQL supports RDBMS workloads up to 30 TB for both MySQL and PostgreSQL
* For OLTP Cloud SQL is appropriate
* For OLAP workloads, consider BigQuery
* If your workload requires dynamic schemas, consider Datastore.
* You can use Dataflow or Dataproc to create ETL jobs that pull data from Cloud SQL and insert it into other storage systems.
Bigtable: Managed wide-column NoSQL
* managed, high-performance NoSQL database service designed for terabyte- to petabyte-scale workloads
* Use case:
1) Real-time app data
2) Stream processing (pub/sub => dataflow(transform) => BigTable)
3) IoT time series data (sensor/streamed data => Bigtable (time series schema))
4) AdTech workloads (can be used to store and track ad impressions which can be used by dataproc and dataflow for processing and analysing)
5) data ingestion (cloud storage => dataflow/dataproc => Bigtable)
6) Analytical Workloads (Bigtable=> dataflow (complex aggrregation) => dataproc((Dataproc can be used to execute Hadoop or Spark processing and machine-learning tasks.))
7) Apache HBase replacement
note: While Bigtable is considered an OLTP system, it doesn't support multi-row transactions, SQL queries or joins. For those use cases, consider either Cloud SQL or Datastore.
Spanner: Horizontally scalable relational database
1) financial services (strong consistency across read/write operations without scarificing HA)
Firestore: Flexible, scalable NoSQL database
* to build a real-time experience serving millions of users without compromising responsiveness.
* Use Cases:
1) chat ans social (Store and retrieve images, audio, video, and other user-generated content.)
(Storing data warehouse data)
BigQuery: Managed data warehouse
* to store data directly in BigQuery for analysis
* to supports loading data through the web interface, command line tools, and REST API calls.
* When loading data in bulk, the data should be in the form of CSV, JSON, or Avro files
* For streaming data, you can use Pub/Sub and Dataflow in combination to process incoming streams and store the resulting data in BigQuery.
* In some workloads, however, it might be appropriate to stream data directly into BigQuery without additional processing.
* to derive business value and insights from data, you must transform and analyze it.
Processing large-scale data
* Large-scale data processing typically involves reading data from source systems such as Cloud Storage, Bigtable, or Cloud SQL, and then conducting complex normalizations or aggregations of that data.
* In many cases, the data is too large to fit on a single machine so frameworks are used to manage distributed compute clusters and to provide software tools that aid processing.
Dataproc: Managed Apache Hadoop and Apache Spark
* Spark has gained popularity over the past few years as an alternative to Hadoop MapReduce
* With Dataproc, you can migrate your existing Hadoop or Spark deployments to a fully-managed service that automates cluster creation, simplifies configuration and management of your cluster, has built-in monitoring and utilization reports, and can be shut down when not in use.
* reduces the operational and cost overhead of managing a Spark or Hadoop deployment
* . Dataproc provides the ease and flexibility to spin up Spark or Hadoop clusters on demand when they are needed, and to terminate clusters when they are no longer needed.
* simplifies operational activities such as installing software or resizing a cluster
* With Dataproc, you can natively read data and write results in Cloud Storage, Bigtable, or BigQuery, or the accompanying HDFS storage provided by the cluster.
*
usecases:
* Log processing
* Reporting (Aggregate data into reports and store the data in BigQuery)
* On-demand Spark clusters
* Machine learning
Dataflow: Serverless, fully managed batch and stream processing
* able to analyze streaming data to respond in real-time
* to deal with batch and streaming analytics
* increases complexity by necessitating two different pipelines
* to simplify big data for both streaming and batch workloads by unifying the programming model and the execution model
* Instead of having to specify a cluster size and manage capacity, Dataflow is a managed service where on-demand resources are created, autoscaled, and parallelized.
*As a true zero-ops service, workers are added or removed based on the demands of the job
*usecases:
1) MapReduce replacement
2) User analytics (Analyze high-volume user-behavior data, such as in-game events, click stream data, and retail sales data.)
3) Data Science (Process large amounts of data to make scientific discoveries and predictions, such as genomics, weather, and financial data.)
4) ETL: Ingest, transform, and load data into a data warehouse, such as BigQuery.
5) Log processing: Process continuous event-log data processing to build real-time dashboards, app metrics, and alerts.
GCP Helicopter Racing League
Helicopter Racing League
Overview
- Regional league
- Offers a paid service to stream the races all over the world with live telemetry and predictions throughout each race.
Solution Concept
- migrate their existing service to a new platform
- to expand their use of managed AI and ML services to facilitate race predictions.
- they want to move the serving of their content, both real-time and recorded, closer to their users.
Existing Technical Environment
- public cloud-first company
- core of their mission-critical applications runs on their current public cloud provider.
- Video recording and editing is performed at the race tracks,
- the content is encoded and transcoded, where needed, in the cloud.
- Enterprise-grade connectivity and local compute is provided by truck-mounted mobile data centers.
- Their race prediction services are hosted exclusively on their existing public cloud provider.[CloudML/Tensorflow/MLWorkflow]
- Existing content is stored in an object storage service on their existing public cloud provider.[Google Storage bucket]
- Video encoding and transcoding is performed on VMs created for each job.
- Race predictions are performed using TensorFlow running on VMs in the current public cloud provider.[CloudML/Tensorflow]
Business Requirement
HRL’s owners want to expand their predictive capabilities and reduce latency for their
viewers in emerging markets. Their requirements are:
- Support ability to expose the predictive models to partners.[Cloud Endpoint]
- Increase predictive capabilities during and before races:(Race result/Mechanical Failure/Crowd sentiment) [Data Ingestion/Data storage/ Processing]
- Increase telemetry and create additional insights.
- Measure fan engagement with new predictions.
- Enhance global availability and quality of the broadcasts.
- Increase the number of concurrent viewers.
- Minimize operational complexity.
- Ensure compliance with regulations.
- Create a merchandising revenue stream .[Cloud Endpoint]
Technical Requirements
- Maintain or increase prediction throughput and accuracy.
- Reduce viewer latency.
- Increase transcoding performance.
- Create real-time analytics of viewer consumption patterns and engagement.
- Create a data mart to enable processing of large volumes of race data
Executive Statement
- enhanced video streams [AutoML Video Intelligence/Video Intelligence API]
- to include predictions of events within the race (e.g., overtaking).
- Our current platform allows us to predict race outcomes but lacks the facility to support real-time predictions during races and the capacity to process season-long results.
=======================Our Analysis==========================
- Streaming ==>> Data Processing
- Predictions ==>> Machine Learning
Enterprise grade connectivity ==>>
Holistic security: Enterprise-grade necessitates a holistic approach towards security, across products, processes, and applications.
Reference Link:
[AutoML Video Intelligence/Video Intelligence API]
No comments :
Post a Comment