...
Info |
---|
Only available on EUMETSAT side. Currently in beta testing. This service is available as best effort for testing. Be aware when trying this service. |
Table of Contents |
---|
Introduction
Many problems in Earth Observation and modelling communities require a common processing algorithm independently applied to thousands (or millions!) of bits of input data. A system to do this with many processing nodes is "High Throughput Computing" (vs "High Performance Computing", which concentrates on running large jobs that will not fit on a single machine on a pool of processing nodes, typically using MPI).
...
Of course, any tenant can install their own batch processing systems for their own purposes with their own resources, but will not be able to take advantage of other shared resources in a centrally organised way.
General
EWC HTcondor is a managed service. The central manager node is deployed in a tenancy on the EWC. Users can join the existing pool adding compute and submit nodes.
...
Some features of the HTCondor in EWC:
Maintenance | Centrally Managed Tenancy, easy 'one click' deployment |
Deployment | Multi tenancy |
Resource | Join automatically the main HTCondor Pool, no need for password or any configuration, only choose the plan for the machine your want to add |
Usage | Easy 'one click' deployment, simple examples for running a job with docker universe |
Network | VPN, which allows processing nodes in a tenancy to communicate with the scheduler / master nodes |
Scheduling | Single schedulers in each tenancy, no possibility to erase other tenancies jobs |
Execute nodes
- No access to execute host for containers
- No access to other containers running on execute node
- Isolated environment for containers
- No autoscaling
- No NFS
Submit nodes
- Only docker universe allowed
- Only condor_submit command allowed
- Private network in the tenancy enabled to allow access to tenancy-internal resources/files
- Condor transfer mechanism allowed
Deploy HTCondor nodes
Pre-requisite
Before deploying an HTcondor node, you need to create an htcondor specific security group. You can follow this page: Creating Security Groups in Morpheus to know how to create security groups.
htcondor security group with the following rules:
Rule name | Direction | Rule Type | Protocol | Port Range | Source Type | Source | Destination Type |
---|---|---|---|---|---|---|---|
egress | Custom Rule | TCP | All | Instance | |||
egress | Custom Rule | UDP | All | Instance | |||
9618-tcp | ingress | Custom Rule | TCP | 9618 | Network | 100.64.0.0/10 | Instance |
Deploy execute or submit node
- Go to Provisioning → Instances and click on Add+ to add a new instance
- Select Htcondor Submit/Execute node
Fill data required:
- plan: choose your plan
- network: private
- security group: htcondor, ssh (only for submit node)
4. Finalize provisioning steps.
...
Code Block |
---|
# dockertest.sub -- example docker job universe = docker docker_image = debian executable = /bin/cat arguments = /etc/hosts should_transfer_files = YES when_to_transfer_output = ON_EXIT log = log/job_$(Process)_sleep.log output = output/job_$(Process)_output.txt error = error/job_$(Process)_errors.txt request_cpus = 1 request_memory = 1024M request_disk = 10240K queue 10010 |
- use condor_submit <job_name>
- verify jobs are running, using condor_q command
Once execute node is up: you can check from a
- ssh into your submit node
- check if the node appears in the list of execute nodes, running condor_status
Test HTCondor
Docker universe job in HTCondor
Try this tutorialon how to create a container and push it to a registry using docker. Moreover it provides an example job that can be submitted to HTCondor.TBD