OpenStack Private Cloud Architecture

Deploy a high-availability OpenStack 2024.2 private cloud with 3-node control plane, spine-leaf networking, and Ceph storage using Kolla-Ansible.

OpenStack Private Cloud Architecture

This architecture deploys a high-availability OpenStack 2024.2 private cloud with a 3-node control plane, spine-leaf networking, and Ceph storage.

Network Configuration

Configure four logically separated networks with VLANs and MTU 9000 for tunnel and storage traffic.

Network Purpose VLAN/Subnet MTU
Management API, DB, Message Queue, SSH VLAN 10 / 172.29.236.0/22 1500
Tunnel/Overlay VXLAN/Geneve (Nova/Neutron) VLAN 20 / 172.29.240.0/22 9000
Storage Ceph replication, iSCSI VLAN 30 / 172.29.244.0/22 9000
Provider/External Tenant external access, Floating IPs VLAN 40 / Public Range 1500

Hardware Requirements:

  • Use spine-leaf topology with 2x 25 GbE bonded (LACP) uplinks per compute node.
  • Enable MTU 9000 on all switches for Tunnel and Storage networks.
  • Configure MLAG or VPC for switch-level redundancy.

Control Plane Design

Deploy three controller nodes behind an HAProxy VIP managed by Keepalived.

                    +-----------------------+
                    |   HAProxy (VIP)       |
                    |   + Keepalived        |
                    +-----------+-----------+
                                |
        +-----------------------+-----------------------+
        |                       |                       |
+-------+-------+       +-------+-------+       +-------+-------+
| ctrl-01       |       | ctrl-02       |       | ctrl-03       |
| Keystone      |       | Keystone      |       | Keystone      |
| Nova-API      |       | Nova-API      |       | Nova-API      |
| Neutron-API   |       | Neutron-API   |       | Neutron-API   |
| Glance-API    |       | Glance-API    |       | Glance-API    |
| Cinder-API    |       | Cinder-API    |       | Cinder-API    |
| MariaDB       |       | MariaDB       |       | MariaDB       |
| RabbitMQ      |       | RabbitMQ      |       | RabbitMQ      |
+---------------+       +---------------+       +---------------+

Database (MariaDB Galera)

  • Configure a 3-node Galera Cluster with synchronous replication.
  • Set wsrep_cluster_address and wsrep_node_address in /etc/mysql/mariadb.conf.d/50-galera.cnf.

Message Queue (RabbitMQ)

  • Cluster three nodes with mirrored quorum queues for nova and neutron.
# /etc/rabbitmq/rabbitmq.conf
cluster_formation.peer_discovery_backend = classic
cluster_formation.classic_nodes = ["rabbit@ctrl-01", "rabbit@ctrl-02", "rabbit@ctrl-03"]

API Load Balancing

  • Terminate SSL at HAProxy and perform active TCP/HTTP health checks on ports :8774 (Nova) and :9696 (Neutron).

Compute Node Design

Install nova-compute, ovn-controller, openvswitch, libvirtd, qemu-kvm, ceph-common, and telegraf.

Hardware Specifications:

  • CPU: 2x Intel Xeon Scalable or AMD EPYC (64+ cores total).
  • RAM: 512 GB – 1 TB DDR5 ECC.
  • Boot Disk: 2x 480 GB NVMe SSD (RAID 1).
  • NIC: 2x 25 GbE (Bonded LACP).

Nova Configuration

# /etc/nova/nova.conf
[DEFAULT]
cpu_allocation_ratio = 4.0
ram_allocation_ratio = 1.0
reserved_host_memory_mb = 8192
reserved_host_cpus = 4
compute_driver = libvirt.LibvirtDriver
libvirt_type = qemu
vncserver_listen = 0.0.0.0
vncserver_proxyclient_address = <management_ip>
network_api_class = nova.network.ovs.network.OVSNetworkAPI

Storage Architecture

Deploy Ceph with 3 MON/MGR nodes and 5+ OSD nodes using NVMe for hot data and HDD for archival.

Storage Tiers (CRUSH Rules)

# Define CRUSH rules for NVMe (Fast) and HDD (Bulk)
ceph osd crush rule create-replicated fast-rule default host ssd
cceph osd pool create fast-volumes 128 128 replicated fast-rule

ceph osd crush rule create-replicated bulk-rule default host hdd
cceph osd pool create bulk-volumes 128 128 replicated bulk-rule

# Map to Cinder Volume Types
# /etc/cinder/cinder.conf
[DEFAULT]
volume_driver = cinder.volume.drivers.ceph.CephDriver
ceph_use_rbd_pool = true
ceph_rbd_pool = fast-volumes

# Create volume type in Horizon or CLI
openstack volume type create fast-ssd --property volume_backend_name=ceph-fast
openstack volume type create bulk-hdd --property volume_backend_name=ceph-bulk

Security Architecture

  • Enforce TLS 1.3 at HAProxy and internal mTLS between services.
  • Use Keystone with LDAP/AD federation or OIDC for authentication.
  • Implement Security Groups (iptables/eBPF) and Project isolation via OVN.
  • Store encryption keys and secrets in Barbican.

Monitoring and Operations

  • Deploy Prometheus, Grafana, Alertmanager, and Loki for metrics, visualization, alerting, and logs.
  • Monitor critical metrics: nova_hypervisor_vcpus_used, neutron_agent_state, ceph_osd_op_latency, rabbitmq_queue_messages_ready, and haproxy_frontend_queue_len.

Capacity Planning

  • vCPUs: (Physical Cores × Allocation Ratio) - Reserved Host CPUs
  • RAM: (Physical RAM × Allocation Ratio) - Reserved Host RAM
  • Storage: (Total OSD Capacity / Replication Factor) × 0.85
  • Instances/Host: MIN(vCPU Available / Flavor vCPU, RAM Available / Flavor RAM)

Deployment Tools

  • Use Kolla-Ansible for containerized deployments and rapid upgrades.