一、概述
1.1 背景介绍
在现代企业IT架构中,基础设施的复杂度呈指数级增长。我们需要同时管理云上的虚拟机、容器集群、数据库实例,以及本地数据中心的物理服务器和网络设备。单一的自动化工具已经无法满足需求:Terraform擅长基础设施供给(Infrastructure Provisioning),但在配置管理和应用部署方面力不从心;Ansible在配置管理和编排方面表现出色,但缺乏对云资源生命周期的完整管理能力。
在我们团队管理的环境中,有超过500台云主机分布在AWS、Azure、阿里云三个平台,还有200多台物理服务器在本地机房。最初我们尝试用Terraform管理所有资源,但很快发现应用部署、配置更新、补丁管理等任务非常繁琐。后来引入Ansible后,又面临云资源创建和销毁的状态管理问题。经过一年多的实践,我们总结出了一套Terraform + Ansible混合编排的最佳实践。
这种混合编排的核心思想是:用Terraform管理基础设施的生命周期(创建、更新、销毁),用Ansible管理配置和应用的部署。Terraform负责"建房子",Ansible负责"装修和入住"。两者通过动态Inventory、输出变量、本地执行器等机制无缝集成,形成完整的自动化运维体系。
1.2 技术特点
声明式基础设施管理:Terraform使用HCL(HashiCorp Configuration Language)声明式语言描述基础设施,通过状态文件跟踪资源变更,支持计划预览和安全变更。相比命令式脚本,声明式管理更易维护和审计。
幂等性配置管理:Ansible的Playbook具有幂等性,多次执行相同任务不会产生副作用。这在配置漂移修复、灾难恢复等场景中非常重要,可以放心地重复执行而不用担心破坏现有配置。
多云统一抽象:Terraform支持200+云服务商的Provider,可以用统一的语法管理AWS EC2、Azure VM、阿里云ECS。Ansible的云模块也支持主流云平台,两者结合实现真正的多云管理。
状态管理和锁机制:Terraform的状态文件记录了基础设施的当前状态,支持远程后端(S3、Consul、Terraform Cloud)和状态锁(DynamoDB、Consul),确保团队协作时不会产生冲突。
模块化和复用:Terraform Module和Ansible Role提供了代码复用机制,可以将常用的基础设施模式和配置任务封装成模块,在不同项目中复用,提升开发效率。
动态Inventory集成:Ansible支持从云平台、Terraform输出、CMDB等数据源动态获取主机列表,无需手工维护静态Inventory文件,自动适应基础设施变化。
1.3 适用场景
多云混合架构管理:企业同时使用多个云平台和本地数据中心,需要统一的自动化工具链。在我们的实践中,通过Terraform管理跨云的VPC、子网、安全组,用Ansible统一部署应用,大大简化了多云管理的复杂度。
大规模服务器集群部署:需要快速部署数百台服务器并完成应用安装配置。Terraform可以并行创建云主机,Ansible通过动态Inventory自动发现新主机并执行配置任务,整个过程可以在30分钟内完成。
不可变基础设施实践:采用"基础设施即代码"理念,每次变更都通过代码提交和CI/CD流水线执行。Terraform的计划预览功能可以在变更前看到影响范围,Ansible的Check模式可以验证配置变更的安全性。
灾难恢复和环境复制:需要快速在不同区域或云平台复制整套环境。所有基础设施和配置都在Git中,只需调整少量参数即可在新环境重建,我们曾在2小时内完成了整个生产环境的跨区域迁移。
合规性和审计要求:金融、医疗等行业对基础设施变更有严格的审计要求。所有变更都通过Git提交记录,Terraform的状态文件提供完整的资源清单,Ansible的日志记录每一步操作,满足合规性审计需求。
1.4 环境要求
| 组件 | 版本要求 | 说明 |
|---|---|---|
| Terraform | 1.5+ | 建议使用1.6+以获得更好的性能和功能 |
| Ansible | 2.14+ | 建议使用2.15+,需要Python 3.9+ |
| Python | 3.9+ | Ansible运行环境,需要安装boto3、azure等SDK |
| Git | 2.30+ | 代码版本管理 |
| AWS CLI | 2.x | 用于AWS资源管理(可选) |
| Azure CLI | 2.x | 用于Azure资源管理(可选) |
| 云平台账号 | - | AWS/Azure/GCP/阿里云账号及相应的IAM权限 |
硬件配置建议:
| 环境类型 | 控制节点 | 托管节点 | 说明 |
|---|---|---|---|
| 开发环境 | 2C4G | 1C2G × 5 | 适合功能验证和学习 |
| 测试环境 | 4C8G | 2C4G × 20 | 适合压力测试和集成测试 |
| 生产环境 | 8C16G | 根据实际需求 | 控制节点需要高可用配置 |
网络要求:
控制节点需要SSH访问托管节点(TCP 22端口)
如果使用Terraform远程后端,需要访问S3、Consul等服务
如果使用Ansible Tower/AWX,需要访问Web界面(HTTP/HTTPS)
托管节点需要访问软件包仓库(yum/apt源)
权限要求:
Terraform:需要云平台的管理员权限或精细化的IAM策略(EC2、VPC、RDS、S3等资源的创建、修改、删除权限)
Ansible:需要托管节点的SSH访问权限,建议使用密钥认证而非密码认证
状态存储:如果使用S3作为Terraform后端,需要S3读写权限和DynamoDB锁表权限
二、详细步骤
2.1 准备工作
2.1.1 系统检查
# 检查操作系统版本 cat /etc/os-release # 检查Python版本(需要3.9+) python3 --version # 检查磁盘空间 df -h # 检查网络连通性 ping -c 3 registry.terraform.io ping -c 3 github.com
2.1.2 安装Terraform
# 下载Terraform(以1.6.6版本为例) wget https://releases.hashicorp.com/terraform/1.6.6/terraform_1.6.6_linux_amd64.zip # 解压并安装 unzip terraform_1.6.6_linux_amd64.zip sudo mv terraform /usr/local/bin/ # 验证安装 terraform version # 启用命令补全 terraform -install-autocomplete # 预期输出: # Terraform v1.6.6 # on linux_amd64
2.1.3 安装Ansible
# 安装Python依赖 sudo apt update sudo apt install -y python3-pip python3-venv # 创建虚拟环境(推荐) python3 -m venv ~/ansible-venv source~/ansible-venv/bin/activate # 安装Ansible pip3 install ansible==2.15.8 # 安装云平台SDK pip3 install boto3 botocore # AWS pip3 install azure-cli # Azure pip3 install google-auth # GCP # 验证安装 ansible --version # 预期输出: # ansible [core 2.15.8] # config file = None # configured module search path = ['/home/user/.ansible/plugins/modules'] # ansible python module location = /home/user/ansible-venv/lib/python3.9/site-packages/ansible # ansible collection location = /home/user/.ansible/collections # executable location = /home/user/ansible-venv/bin/ansible # python version = 3.9.x
2.1.4 配置云平台凭证
AWS凭证配置:
# 安装AWS CLI
curl"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"-o"awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
# 配置AWS凭证
aws configure
# AWS Access Key ID: YOUR_ACCESS_KEY
# AWS Secret Access Key: YOUR_SECRET_KEY
# Default region name: us-east-1
# Default output format: json
# 验证配置
aws sts get-caller-identity
# 预期输出:
# {
# "UserId": "AIDAXXXXXXXXXX",
# "Account": "123456789012",
# "Arn": "arniam:user/terraform"
# }
Azure凭证配置:
# 安装Azure CLI curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash # 登录Azure az login # 设置默认订阅 az accountset--subscription"YOUR_SUBSCRIPTION_ID" # 创建Service Principal(用于Terraform) az ad sp create-for-rbac --name"terraform-sp"--role="Contributor"--scopes="/subscriptions/YOUR_SUBSCRIPTION_ID" # 记录输出的appId、password、tenant
2.1.5 初始化项目结构
# 创建项目目录
mkdir -p ~/infra-automation/{terraform,ansible,scripts}
cd~/infra-automation
# 创建Terraform目录结构
mkdir -p terraform/{modules,environments/{dev,staging,prod}}
# 创建Ansible目录结构
mkdir -p ansible/{inventories,playbooks,roles,group_vars,host_vars}
# 创建Git仓库
git init
cat > .gitignore <
2.2 核心配置
2.2.1 配置Terraform远程后端
使用S3和DynamoDB实现状态管理和锁机制:
# 文件路径:terraform/backend.tf
terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# 启用版本控制
versioning = true
}
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
创建S3 Bucket和DynamoDB表:
# 创建S3 Bucket
aws s3api create-bucket
--bucket my-terraform-state-bucket
--region us-east-1
# 启用版本控制
aws s3api put-bucket-versioning
--bucket my-terraform-state-bucket
--versioning-configuration Status=Enabled
# 启用加密
aws s3api put-bucket-encryption
--bucket my-terraform-state-bucket
--server-side-encryption-configuration'{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}]
}'
# 创建DynamoDB锁表
aws dynamodb create-table
--table-name terraform-state-lock
--attribute-definitions AttributeName=LockID,AttributeType=S
--key-schema AttributeName=LockID,KeyType=HASH
--billing-mode PAY_PER_REQUEST
--region us-east-1
2.2.2 创建Terraform模块
VPC模块示例:
# 文件路径:terraform/modules/vpc/main.tf
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "environment" {
description = "Environment name"
type = string
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
ManagedBy = "Terraform"
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-subnet-${count.index + 1}"
Environment = var.environment
Type = "public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-subnet-${count.index + 1}"
Environment = var.environment
Type = "private"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
Environment = var.environment
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.environment}-public-rt"
Environment = var.environment
}
}
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "Public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "Private subnet IDs"
value = aws_subnet.private[*].id
}
EC2实例模块示例:
# 文件路径:terraform/modules/ec2/main.tf
variable "instance_count" {
description = "Number of instances to create"
type = number
default = 1
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.medium"
}
variable "ami_id" {
description = "AMI ID"
type = string
}
variable "subnet_ids" {
description = "List of subnet IDs"
type = list(string)
}
variable "vpc_id" {
description = "VPC ID"
type = string
}
variable "key_name" {
description = "SSH key pair name"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "application" {
description = "Application name"
type = string
}
# 安全组
resource "aws_security_group" "instance" {
name = "${var.environment}-${var.application}-sg"
description = "Security group for ${var.application}"
vpc_id = var.vpc_id
ingress {
description = "SSH from anywhere"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTP from anywhere"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS from anywhere"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
description = "Allow all outbound"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-${var.application}-sg"
Environment = var.environment
Application = var.application
}
}
# EC2实例
resource "aws_instance" "app" {
count = var.instance_count
ami = var.ami_id
instance_type = var.instance_type
subnet_id = var.subnet_ids[count.index % length(var.subnet_ids)]
vpc_security_group_ids = [aws_security_group.instance.id]
key_name = var.key_name
root_block_device {
volume_type = "gp3"
volume_size = 50
delete_on_termination = true
encrypted = true
}
tags = {
Name = "${var.environment}-${var.application}-${count.index + 1}"
Environment = var.environment
Application = var.application
ManagedBy = "Terraform"
}
lifecycle {
create_before_destroy = true
}
}
output "instance_ids" {
description = "EC2 instance IDs"
value = aws_instance.app[*].id
}
output "instance_public_ips" {
description = "EC2 instance public IPs"
value = aws_instance.app[*].public_ip
}
output "instance_private_ips" {
description = "EC2 instance private IPs"
value = aws_instance.app[*].private_ip
}
2.2.3 配置环境变量文件
# 文件路径:terraform/environments/prod/main.tf
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = "production"
ManagedBy = "Terraform"
Project = "infra-automation"
}
}
}
module "vpc" {
source = "../../modules/vpc"
vpc_cidr = "10.0.0.0/16"
environment = "prod"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
module "web_servers" {
source = "../../modules/ec2"
instance_count = 3
instance_type = "t3.medium"
ami_id = var.ami_id
subnet_ids = module.vpc.public_subnet_ids
vpc_id = module.vpc.vpc_id
key_name = var.key_name
environment = "prod"
application = "web"
}
# 输出到Ansible
resource "local_file" "ansible_inventory" {
content = templatefile("${path.module}/inventory.tpl", {
web_servers = module.web_servers.instance_public_ips
})
filename = "${path.module}/../../../ansible/inventories/prod/hosts"
}
output "vpc_id" {
value = module.vpc.vpc_id
}
output "web_server_ips" {
value = module.web_servers.instance_public_ips
}
# 文件路径:terraform/environments/prod/variables.tf
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "ami_id" {
description = "AMI ID for EC2 instances"
type = string
}
variable "key_name" {
description = "SSH key pair name"
type = string
}
# 文件路径:terraform/environments/prod/terraform.tfvars
aws_region = "us-east-1"
ami_id = "ami-0c55b159cbfafe1f0" # Amazon Linux 2
key_name = "prod-key"
Ansible Inventory模板:
# 文件路径:terraform/environments/prod/inventory.tpl
[web_servers]
%{ for ip in web_servers ~}
${ip} ansible_user=ec2-user ansible_ssh_private_key_file=~/.ssh/prod-key.pem
%{ endfor ~}
[web_servers:vars]
ansible_python_interpreter=/usr/bin/python3
2.3 启动和验证
2.3.1 执行Terraform部署
# 进入环境目录
cdterraform/environments/prod
# 初始化Terraform
terraform init
# 预期输出:
# Initializing the backend...
# Successfully configured the backend "s3"!
# Initializing modules...
# Initializing provider plugins...
# Terraform has been successfully initialized!
# 验证配置
terraform validate
# 预览变更
terraform plan
# 预期输出:
# Terraform will perform the following actions:
# # module.vpc.aws_vpc.main will be created
# # module.vpc.aws_subnet.public[0] will be created
# # ...
# Plan: 15 to add, 0 to change, 0 to destroy.
# 应用变更
terraform apply
# 输入 yes 确认
# 预期输出:
# Apply complete! Resources: 15 added, 0 changed, 0 destroyed.
# Outputs:
# vpc_id = "vpc-0123456789abcdef0"
# web_server_ips = [
# "54.123.45.67",
# "54.123.45.68",
# "54.123.45.69",
# ]
2.3.2 验证Ansible Inventory
# 查看生成的Inventory文件
cat ../../../ansible/inventories/prod/hosts
# 预期输出:
# [web_servers]
# 54.123.45.67 ansible_user=ec2-user ansible_ssh_private_key_file=~/.ssh/prod-key.pem
# 54.123.45.68 ansible_user=ec2-user ansible_ssh_private_key_file=~/.ssh/prod-key.pem
# 54.123.45.69 ansible_user=ec2-user ansible_ssh_private_key_file=~/.ssh/prod-key.pem
#
# [web_servers:vars]
# ansible_python_interpreter=/usr/bin/python3
# 测试连接
cd../../../ansible
ansible web_servers -i inventories/prod/hosts -m ping
# 预期输出:
# 54.123.45.67 | SUCCESS => {
# "changed": false,
# "ping": "pong"
# }
# 54.123.45.68 | SUCCESS => {
# "changed": false,
# "ping": "pong"
# }
# 54.123.45.69 | SUCCESS => {
# "changed": false,
# "ping": "pong"
# }
2.3.3 执行Ansible配置
# 运行Playbook(后续章节会详细介绍)
ansible-playbook -i inventories/prod/hosts playbooks/web-setup.yml
# 验证服务状态
ansible web_servers -i inventories/prod/hosts -m shell -a"systemctl status nginx"
三、示例代码和配置
3.1 完整配置示例
3.1.1 Ansible Playbook配置
Web服务器部署Playbook:
# 文件路径:ansible/playbooks/web-setup.yml
---
-name:SetupWebServers
hosts:web_servers
become:yes
vars:
nginx_version:"1.24.0"
app_user:"webapp"
app_dir:"/opt/webapp"
tasks:
-name:Updatesystempackages
yum:
name:"*"
state:latest
update_cache:yes
-name:Installrequiredpackages
yum:
name:
-nginx
-python3
-python3-pip
-git
state:present
-name:Createapplicationuser
user:
name:"{{ app_user }}"
system:yes
shell:/bin/bash
home:"{{ app_dir }}"
-name:Createapplicationdirectory
file:
path:"{{ app_dir }}"
state:directory
owner:"{{ app_user }}"
group:"{{ app_user }}"
mode:'0755'
-name:ConfigureNginx
template:
src:../templates/nginx.conf.j2
dest:/etc/nginx/nginx.conf
owner:root
group:root
mode:'0644'
notify:RestartNginx
-name:StartandenableNginx
systemd:
name:nginx
state:started
enabled:yes
handlers:
-name:RestartNginx
systemd:
name:nginx
state:restarted
Nginx配置模板:
# 文件路径:ansible/templates/nginx.conf.j2
user {{ app_user }};
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
events {
worker_connections 1024;
}
http {
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name _;
root {{ app_dir }}/public;
location / {
try_files $uri $uri/ =404;
}
error_page 404 /404.html;
location = /404.html {
}
error_page 500 502 503 504 /50x.html;
location = /50x.html {
}
}
}
3.1.2 Ansible Role结构
创建可复用的Ansible Role:
# 创建Role结构
cdansible/roles
ansible-galaxy init webserver
# 生成的目录结构:
# webserver/
# ├── defaults/
# │ └── main.yml
# ├── files/
# ├── handlers/
# │ └── main.yml
# ├── meta/
# │ └── main.yml
# ├── tasks/
# │ └── main.yml
# ├── templates/
# ├── tests/
# │ ├── inventory
# │ └── test.yml
# └── vars/
# └── main.yml
Role任务定义:
# 文件路径:ansible/roles/webserver/tasks/main.yml
---
-name:IncludeOS-specificvariables
include_vars:"{{ ansible_os_family }}.yml"
-name:Installwebserverpackages
package:
name:"{{ webserver_packages }}"
state:present
-name:Configurewebserver
template:
src:"{{ webserver_config_template }}"
dest:"{{ webserver_config_path }}"
owner:root
group:root
mode:'0644'
notify:restartwebserver
-name:Ensurewebserverisrunning
service:
name:"{{ webserver_service_name }}"
state:started
enabled:yes
Role变量定义:
# 文件路径:ansible/roles/webserver/defaults/main.yml
---
webserver_port:80
webserver_user:nginx
webserver_worker_processes:auto
webserver_worker_connections:1024
# 文件路径:ansible/roles/webserver/vars/RedHat.yml
---
webserver_packages:
-nginx
-nginx-mod-stream
webserver_service_name:nginx
webserver_config_path:/etc/nginx/nginx.conf
webserver_config_template:nginx-redhat.conf.j2
3.1.3 Terraform + Ansible集成脚本
自动化部署脚本:
#!/bin/bash
# 文件名:scripts/deploy.sh
# 功能:自动化执行Terraform和Ansible部署
set-e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")"&&pwd)"
PROJECT_ROOT="$(dirname"$SCRIPT_DIR")"
ENVIRONMENT="${1:-prod}"
echo "==> 开始部署环境:${ENVIRONMENT}"
# 1. 执行Terraform部署
echo "==> 步骤1: 执行Terraform部署"
cd "${PROJECT_ROOT}/terraform/environments/${ENVIRONMENT}"
terraform init -upgrade
terraform validate
terraform plan -out=tfplan
read -p "是否继续应用Terraform变更? (yes/no):" confirm
if [ "$confirm" != "yes" ]; then
echo "部署已取消"
exit 1
fi
terraform apply tfplan
rm -f tfplan
# 2. 等待实例就绪
echo "==> 步骤2: 等待EC2实例就绪"
sleep 30
# 3. 执行Ansible配置
echo "==> 步骤3: 执行Ansible配置"
cd "${PROJECT_ROOT}/ansible"
INVENTORY_FILE="inventories/${ENVIRONMENT}/hosts"
# 测试连接
echo "==> 测试SSH连接"
ansible all -i "${INVENTORY_FILE}" -m ping
# 运行Playbook
echo "==> 运行Playbook"
ansible-playbook -i "${INVENTORY_FILE}" playbooks/web-setup.yml
# 4. 验证部署
echo "==> 步骤4: 验证部署"
ansible web_servers -i "${INVENTORY_FILE}" -m shell -a "systemctl status nginx"
echo "==> 部署完成!"
echo "==> 访问地址:"
cd "${PROJECT_ROOT}/terraform/environments/${ENVIRONMENT}"
terraform output web_server_ips
3.2 实际应用案例
案例一:多环境基础设施管理
场景描述:需要管理开发、测试、生产三套环境,每套环境的配置略有不同(实例数量、规格、网络配置等)。
实现步骤:
创建环境特定的变量文件:
# 文件路径:terraform/environments/dev/terraform.tfvars
aws_region = "us-east-1"
ami_id = "ami-0c55b159cbfafe1f0"
key_name = "dev-key"
instance_count = 1
instance_type = "t3.small"
vpc_cidr = "10.1.0.0/16"
# 文件路径:terraform/environments/prod/terraform.tfvars
aws_region = "us-east-1"
ami_id = "ami-0c55b159cbfafe1f0"
key_name = "prod-key"
instance_count = 5
instance_type = "t3.large"
vpc_cidr = "10.0.0.0/16"
使用Workspace管理多环境:
# 创建并切换到开发环境
terraform workspace new dev
terraform workspace select dev
terraform apply -var-file="environments/dev/terraform.tfvars"
# 切换到生产环境
terraform workspace select prod
terraform apply -var-file="environments/prod/terraform.tfvars"
# 查看所有环境
terraform workspace list
运行结果:
# 开发环境
$ terraform workspace select dev
$ terraform apply
Apply complete! Resources: 8 added, 0 changed, 0 destroyed.
Outputs:
vpc_id ="vpc-dev-0123456789"
web_server_ips = ["54.123.45.10"]
# 生产环境
$ terraform workspace select prod
$ terraform apply
Apply complete! Resources: 20 added, 0 changed, 0 destroyed.
Outputs:
vpc_id ="vpc-prod-0123456789"
web_server_ips = [
"54.123.45.67",
"54.123.45.68",
"54.123.45.69",
"54.123.45.70",
"54.123.45.71",
]
四、最佳实践和注意事项
4.1 最佳实践
4.1.1 性能优化
优化点一:Terraform并行执行优化
Terraform默认会并行创建资源,但可以通过参数调整并行度:
# 增加并行度(默认10)
terraform apply -parallelism=20
# 在大规模部署时,适当增加并行度可以显著缩短部署时间
# 但要注意云平台的API限流
在我们的实践中,管理500台EC2实例时,将并行度设置为30,部署时间从45分钟缩短到15分钟。
优化点二:Ansible执行效率优化
# 文件路径:ansible/ansible.cfg
[defaults]
# 增加并发连接数(默认5)
forks = 50
# 启用SSH连接复用
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
pipelining = True
# 启用事实缓存
[defaults]
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
通过这些优化,我们将100台服务器的配置时间从30分钟缩短到8分钟。
优化点三:状态文件优化
# 使用部分状态(Partial Configuration)
# 文件路径:terraform/backend-config/prod.hcl
bucket = "my-terraform-state-bucket"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
# 初始化时指定后端配置
terraform init -backend-config=backend-config/prod.hcl
# 这样可以在不同环境使用不同的状态文件,避免冲突
4.1.2 安全加固
安全措施一:使用Ansible Vault加密敏感信息
# 创建加密文件
ansible-vault create ansible/group_vars/all/vault.yml
# 编辑加密文件
ansible-vault edit ansible/group_vars/all/vault.yml
# 文件内容示例
---
vault_db_password:"super_secret_password"
vault_api_key:"api_key_12345"
vault_aws_access_key:"AKIAIOSFODNN7EXAMPLE"
vault_aws_secret_key:"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# 在Playbook中引用
# 文件路径:ansible/playbooks/app-deploy.yml
---
-name:DeployApplication
hosts:app_servers
vars_files:
-../group_vars/all/vault.yml
tasks:
-name:Configuredatabaseconnection
template:
src:db_config.j2
dest:/etc/app/db.conf
vars:
db_password:"{{ vault_db_password }}"
# 运行Playbook时提供密码
ansible-playbook -i inventories/prod/hosts playbooks/app-deploy.yml --ask-vault-pass
# 或使用密码文件
echo"my_vault_password"> .vault_pass
chmod 600 .vault_pass
ansible-playbook -i inventories/prod/hosts playbooks/app-deploy.yml --vault-password-file .vault_pass
安全措施二:Terraform敏感输出保护
# 文件路径:terraform/environments/prod/outputs.tf
output "db_password" {
description = "Database password"
value = random_password.db_password.result
sensitive = true # 标记为敏感信息,不会在日志中显示
}
output "rds_endpoint" {
description = "RDS endpoint"
value = aws_db_instance.main.endpoint
}
# 查看敏感输出
terraform output -json | jq'.db_password.value'-r
安全措施三:最小权限原则
{
"Version":"2012-10-17",
"Statement": [
{
"Effect":"Allow",
"Action": [
"ec2:Describe*",
"ec2:CreateTags",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:StopInstances",
"ec2:StartInstances"
],
"Resource":"*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion":"us-east-1"
}
}
},
{
"Effect":"Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource":"arns3:::my-terraform-state-bucket/*"
},
{
"Effect":"Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:DeleteItem"
],
"Resource":"arndynamodb123456789012:table/terraform-state-lock"
}
]
}
4.1.3 高可用配置
HA方案一:多区域部署
# 文件路径:terraform/modules/multi-region/main.tf
variable "regions" {
description = "List of AWS regions"
type = list(string)
default = ["us-east-1", "us-west-2", "eu-west-1"]
}
module "vpc" {
for_each = toset(var.regions)
source = "../vpc"
providers = {
aws = aws.region[each.key]
}
vpc_cidr = cidrsubnet("10.0.0.0/8", 8, index(var.regions, each.key))
environment = "prod"
region = each.key
}
HA方案二:自动故障转移
# 文件路径:ansible/playbooks/ha-setup.yml
---
-name:SetupHighAvailability
hosts:web_servers
become:yes
tasks:
-name:InstallKeepalived
yum:
name:keepalived
state:present
-name:ConfigureKeepalived
template:
src:../templates/keepalived.conf.j2
dest:/etc/keepalived/keepalived.conf
notify:RestartKeepalived
-name:StartKeepalived
systemd:
name:keepalived
state:started
enabled:yes
handlers:
-name:RestartKeepalived
systemd:
name:keepalived
state:restarted
4.2 注意事项
4.2.1 配置注意事项
警告:以下配置错误可能导致生产环境故障或数据丢失,请务必注意!
注意事项一:避免Terraform状态文件冲突
在团队协作时,多人同时执行terraform apply可能导致状态文件冲突。必须使用状态锁机制:
# 错误示例:本地状态文件
# 多人协作时会导致状态不一致
# 正确示例:使用远程后端和锁
terraform {
backend"s3"{
bucket ="my-terraform-state-bucket"
key ="prod/terraform.tfstate"
region ="us-east-1"
dynamodb_table ="terraform-state-lock"# 必须配置锁表
}
}
如果遇到锁被占用的情况:
# 查看锁状态
aws dynamodb get-item
--table-name terraform-state-lock
--key'{"LockID":{"S":"my-terraform-state-bucket/prod/terraform.tfstate-md5"}}'
# 强制解锁(谨慎使用,确保没有其他人在执行)
terraform force-unlock
注意事项二:Ansible幂等性验证
并非所有Ansible模块都是幂等的,使用shell/command模块时需要特别注意:
# 错误示例:非幂等操作
-name:Addlinetofile
shell:echo"new line">>/etc/config.conf
# 正确示例:使用幂等模块
-name:Addlinetofile
lineinfile:
path:/etc/config.conf
line:"new line"
state:present
注意事项三:Terraform资源依赖管理
# 显式声明依赖关系
resource "aws_instance" "app" {
ami = var.ami_id
instance_type = "t3.medium"
subnet_id = aws_subnet.public[0].id
# 确保VPC和子网先创建
depends_on = [
aws_vpc.main,
aws_subnet.public
]
}
4.2.2 常见错误
错误现象
原因分析
解决方案
Terraform apply卡住不动
状态锁被占用或网络问题
检查DynamoDB锁表,使用terraform force-unlock解锁
Ansible连接超时
SSH密钥权限错误或安全组配置问题
检查密钥权限chmod 600 key.pem,验证安全组规则
Terraform状态漂移
手工修改了云资源
使用terraform refresh更新状态,或terraform import导入资源
Ansible任务失败但没有错误信息
忽略了错误或使用了ignore_errors
移除ignore_errors,检查任务返回值
Terraform destroy失败
资源有依赖关系或保护机制
使用terraform state rm移除状态,手工删除资源
4.2.3 兼容性问题
版本兼容:
Terraform 1.x与0.x语法不完全兼容,升级前需要测试
Ansible 2.10+采用Collection机制,部分模块路径变更
AWS Provider 5.x对某些资源的参数做了breaking changes
建议在.terraform-version和requirements.txt中锁定版本
云平台兼容:
AWS中国区域的API endpoint与国际区不同,需要特殊配置
Azure中国区需要使用特定的环境变量AZURE_ENVIRONMENT=AzureChinaCloud
阿里云的Terraform Provider功能相对较少,部分资源需要手工管理
操作系统兼容:
Ansible在Windows上的支持有限,建议使用WSL或Linux控制节点
不同Linux发行版的包管理器不同(yum/apt/zypper),需要在Playbook中适配
macOS上的某些工具(如sed)与Linux版本行为不同
五、故障排查和监控
5.1 故障排查
5.1.1 日志查看
# Terraform日志
# 启用详细日志
exportTF_LOG=DEBUG
exportTF_LOG_PATH=./terraform-debug.log
terraform apply
# 查看Terraform状态
terraform show
terraform state list
terraform state show aws_instance.app[0]
# Ansible日志
# 启用详细输出
ansible-playbook -i inventories/prod/hosts playbooks/web-setup.yml -vvv
# 查看Ansible日志文件
tail -f /var/log/ansible.log
# 查看特定主机的执行结果
ansible web_servers -i inventories/prod/hosts -m setup | grep ansible_distribution
5.1.2 常见问题排查
问题一:Terraform状态锁定问题
# 诊断命令
# 1. 查看锁状态
aws dynamodb scan --table-name terraform-state-lock
# 2. 查看锁详情
aws dynamodb get-item
--table-name terraform-state-lock
--key'{"LockID":{"S":"my-terraform-state-bucket/prod/terraform.tfstate-md5"}}'
# 3. 如果确认没有其他人在执行,强制解锁
terraform force-unlock
# 4. 如果DynamoDB表损坏,重新创建
aws dynamodb delete-table --table-name terraform-state-lock
aws dynamodb create-table
--table-name terraform-state-lock
--attribute-definitions AttributeName=LockID,AttributeType=S
--key-schema AttributeName=LockID,KeyType=HASH
--billing-mode PAY_PER_REQUEST
解决方案:
确保团队成员在执行前沟通,避免同时操作
使用CI/CD流水线串行执行Terraform任务
定期清理过期的锁记录
问题二:Ansible连接失败
# 诊断步骤
# 1. 测试SSH连接
ssh -i ~/.ssh/prod-key.pem ec2-user@54.123.45.67
# 2. 检查密钥权限
ls -la ~/.ssh/prod-key.pem
# 应该是 -rw------- (600)
# 3. 修复权限
chmod 600 ~/.ssh/prod-key.pem
# 4. 测试Ansible连接
ansible web_servers -i inventories/prod/hosts -m ping -vvv
# 5. 检查安全组规则
aws ec2 describe-security-groups
--group-ids sg-0123456789abcdef0
--query'SecurityGroups[0].IpPermissions'
# 6. 添加SSH规则(如果缺失)
aws ec2 authorize-security-group-ingress
--group-id sg-0123456789abcdef0
--protocol tcp
--port 22
--cidr 0.0.0.0/0
问题三:Terraform资源导入
当手工创建了资源,需要导入到Terraform管理:
# 1. 编写资源配置
cat > import-resource.tf <
5.1.3 调试模式
Terraform调试:
# 启用调试日志
exportTF_LOG=TRACE
exportTF_LOG_PATH=./terraform-trace.log
# 查看Provider插件日志
exportTF_LOG_PROVIDER=TRACE
# 执行操作
terraform apply
# 分析日志
grep"ERROR"terraform-trace.log
grep"aws_instance"terraform-trace.log | head -20
Ansible调试:
# 使用调试模块
ansible-playbook -i inventories/prod/hosts playbooks/web-setup.yml
--start-at-task="Install Nginx"
--step
# 使用debug模块输出变量
cat > debug-playbook.yml <
5.2 性能监控
5.2.1 关键指标监控
Terraform执行时间监控:
# 记录执行时间
time terraform apply
# 分析资源创建时间
terraform apply -json | jq -r'.["@message"]'| grep"Creation complete"
# 使用Terraform Cloud监控
# 在Terraform Cloud中可以看到每次执行的详细时间线
Ansible执行性能监控:
# 文件路径:ansible/ansible.cfg
[defaults]
# 启用性能分析
callback_whitelist = profile_tasks, timer
# 启用任务计时
[callback_profile_tasks]
task_output_limit = 20
# 运行Playbook并查看性能报告
ansible-playbook -i inventories/prod/hosts playbooks/web-setup.yml
# 输出示例:
# PLAY RECAP *********************************************************************
# 54.123.45.67 : ok=10 changed=5 unreachable=0 failed=0
# 54.123.45.68 : ok=10 changed=5 unreachable=0 failed=0
# 54.123.45.69 : ok=10 changed=5 unreachable=0 failed=0
#
# Tuesday 15 January 2024 1045 +0000 (002.345) 023.456 ******
# ===============================================================================
# Install Nginx --------------------------------------------------------- 45.23s
# Update system packages ------------------------------------------------ 32.15s
# Configure Nginx ------------------------------------------------------- 12.34s
# ...
5.2.2 监控指标说明
指标名称
正常范围
告警阈值
说明
Terraform apply时间
< 10分钟
> 30分钟
大规模部署时间
Terraform plan时间
< 2分钟
> 5分钟
配置验证时间
Ansible执行时间
< 5分钟/100台
> 15分钟/100台
配置管理时间
SSH连接成功率
> 99%
< 95%
网络连通性
状态文件大小
< 10MB
> 50MB
状态文件膨胀
资源创建成功率
> 98%
< 90%
云平台稳定性
5.2.3 监控告警配置
CloudWatch告警配置:
# 文件路径:terraform/modules/monitoring/cloudwatch.tf
resource "aws_cloudwatch_metric_alarm" "ec2_cpu_high" {
alarm_name = "ec2-cpu-utilization-high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "300"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_sns_topic.alerts.arn]
dimensions = {
InstanceId = aws_instance.app[0].id
}
}
resource "aws_sns_topic" "alerts" {
name = "infrastructure-alerts"
}
resource "aws_sns_topic_subscription" "email" {
topic_arn = aws_sns_topic.alerts.arn
protocol = "email"
endpoint = "ops-team@example.com"
}
Prometheus监控配置:
# 文件路径:monitoring/prometheus.yml
global:
scrape_interval:15s
evaluation_interval:15s
scrape_configs:
-job_name:'node-exporter'
static_configs:
-targets:
-'54.123.45.67:9100'
-'54.123.45.68:9100'
-'54.123.45.69:9100'
-job_name:'nginx'
static_configs:
-targets:
-'54.123.45.67:9113'
-'54.123.45.68:9113'
-'54.123.45.69:9113'
alerting:
alertmanagers:
-static_configs:
-targets:
-'alertmanager:9093'
rule_files:
-'alerts.yml'
# 文件路径:monitoring/alerts.yml
groups:
-name:infrastructure
interval:30s
rules:
-alert:HighCPUUsage
expr:node_cpu_seconds_total{mode="idle"}< 20
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for 5 minutes"
- alert: DiskSpaceLow
expr: node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "Disk space is low"
description: "Less than 10% disk space available"
- alert: ServiceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "{{ $labels.job }} on {{ $labels.instance }} is down"
5.3 备份与恢复
5.3.1 备份策略
#!/bin/bash
# 文件名:scripts/backup.sh
# 功能:备份Terraform状态和Ansible配置
set-e
BACKUP_DIR="/backup/infra-automation/$(date +%Y%m%d-%H%M%S)"
mkdir -p${BACKUP_DIR}/{terraform,ansible}
echo"==> 开始备份基础设施配置"
# 1. 备份Terraform状态文件
echo"==> 备份Terraform状态"
aws s3 cp s3://my-terraform-state-bucket/prod/terraform.tfstate
${BACKUP_DIR}/terraform/terraform.tfstate
# 2. 备份Terraform配置文件
echo"==> 备份Terraform配置"
tar -czf${BACKUP_DIR}/terraform/config.tar.gz
terraform/
# 3. 备份Ansible配置
echo"==> 备份Ansible配置"
tar -czf${BACKUP_DIR}/ansible/config.tar.gz
ansible/
# 4. 备份到S3
echo"==> 上传到S3"
aws s3 sync${BACKUP_DIR}
s3://my-backup-bucket/infra-automation/$(date +%Y%m%d-%H%M%S)/
# 5. 清理本地旧备份(保留最近7天)
find /backup/infra-automation -typed -mtime +7 -execrm -rf {} ;
echo"==> 备份完成:${BACKUP_DIR}"
5.3.2 恢复流程
停止自动化任务:
# 停止CI/CD流水线
# 通知团队成员暂停手工操作
恢复Terraform状态:
# 下载备份
aws s3 cp s3://my-backup-bucket/infra-automation/20240115-120000/terraform/terraform.tfstate
./terraform.tfstate.backup
# 恢复到S3后端
aws s3 cp ./terraform.tfstate.backup
s3://my-terraform-state-bucket/prod/terraform.tfstate
# 验证状态
cdterraform/environments/prod
terraform init
terraform plan
恢复Ansible配置:
# 下载备份
aws s3 cp s3://my-backup-bucket/infra-automation/20240115-120000/ansible/config.tar.gz
./ansible-backup.tar.gz
# 解压恢复
tar -xzf ansible-backup.tar.gz
# 验证配置
ansible-playbook -i ansible/inventories/prod/hosts ansible/playbooks/web-setup.yml --check
验证恢复结果:
# 验证Terraform资源
terraform state list
terraform show
# 验证Ansible连接
ansible all -i ansible/inventories/prod/hosts -m ping
# 验证服务状态
ansible web_servers -i ansible/inventories/prod/hosts -m shell -a"systemctl status nginx"
六、总结
6.1 技术要点回顾
基础设施即代码理念:通过Terraform和Ansible将基础设施和配置管理代码化,实现了版本控制、可追溯和可重复部署。所有变更都通过Git提交记录,天然具备完整的审计日志。在我们的实践中,这种方式将环境部署时间从2天缩短到2小时。
混合编排最佳实践:Terraform负责基础设施生命周期管理(创建、更新、销毁),Ansible负责配置管理和应用部署。两者通过动态Inventory、输出变量、本地执行器等机制无缝集成,形成完整的自动化运维体系。
状态管理和锁机制:Terraform的远程后端(S3 + DynamoDB)确保了团队协作时的状态一致性和并发安全。状态文件记录了基础设施的当前状态,支持版本控制和回滚,是基础设施管理的核心。
模块化和复用:Terraform Module和Ansible Role提供了代码复用机制,将常用的基础设施模式和配置任务封装成模块,在不同项目和环境中复用。我们团队维护了20+个Terraform模块和30+个Ansible Role,覆盖了90%的常见场景。
多环境管理策略:通过Terraform Workspace、环境特定的变量文件、Ansible Inventory分组等机制,实现了开发、测试、生产环境的统一管理和差异化配置。同一套代码可以部署到不同环境,只需调整少量参数。
安全和合规:使用Ansible Vault加密敏感信息,Terraform敏感输出保护,IAM最小权限原则,确保了基础设施管理的安全性。所有变更都通过Git审计,满足合规性要求。
6.2 进阶学习方向
方向一:Terraform高级特性
深入学习Terraform的企业级应用:
学习资源:
实践建议:在生产环境中实施Terraform Cloud,利用其远程执行和策略管理能力。我们团队使用Sentinel策略确保所有EC2实例都启用加密,所有S3 Bucket都启用版本控制。
Terraform官方文档
Terraform: Up & Running- 深入讲解Terraform实战的书籍
HashiCorp Learn- 官方教程
Terraform Cloud/Enterprise:团队协作、远程执行、策略即代码(Sentinel)、私有模块注册表
动态配置生成:使用for_each、count、dynamic块实现更灵活的资源管理
自定义Provider开发:为内部系统开发Terraform Provider,实现统一管理
Terraform测试:使用Terratest进行基础设施代码测试
方向二:Ansible自动化深化
掌握Ansible的高级功能:
学习资源:
实践建议:部署Ansible Tower/AWX,实现自助式基础设施管理。开发团队可以通过Web界面触发预定义的Playbook,无需直接访问生产环境。
Ansible官方文档
Ansible for DevOps- 实战指南
Ansible Galaxy- 社区Role和Collection
Ansible Tower/AWX:企业级自动化平台,提供Web界面、RBAC、作业调度、审计日志
动态Inventory插件:从CMDB、云平台、Kubernetes等数据源动态获取主机列表
自定义模块开发:使用Python开发Ansible模块,扩展功能
Ansible Collections:使用和发布Ansible Collections,实现模块化管理
方向三:CI/CD集成
将Terraform和Ansible集成到CI/CD流水线:
学习资源:
实践建议:实施完整的GitOps流水线,所有基础设施变更都通过PR触发自动化测试和部署。我们的流水线包括:代码检查 → Terraform plan → 人工审批 → Terraform apply → Ansible配置 → 验证测试。
GitLab CI/CD文档
Terraform in CI/CD
GitLab CI/CD:使用GitLab Pipeline自动化执行Terraform和Ansible
Jenkins Pipeline:编写Jenkinsfile实现基础设施持续交付
GitHub Actions:使用GitHub Actions实现GitOps工作流
策略验证:集成OPA、Checkov等工具进行策略验证
6.3 参考资料
Terraform官方文档- Terraform完整使用指南和API参考
Ansible官方文档- Ansible完整使用指南和模块参考
AWS Provider文档- Terraform AWS Provider详细文档
Terraform Registry- Terraform模块和Provider市场
Ansible Galaxy- Ansible Role和Collection市场
HashiCorp Learn- HashiCorp官方学习平台
Infrastructure as Code- IaC经典书籍
附录
A. 命令速查表
# Terraform常用命令
terraform init # 初始化工作目录
terraform init -upgrade # 升级Provider插件
terraform validate # 验证配置语法
terraform fmt # 格式化配置文件
terraform plan # 预览变更
terraform plan -out=tfplan # 保存执行计划
terraform apply # 应用变更
terraform apply tfplan # 应用保存的计划
terraform destroy # 销毁所有资源
terraform show # 显示当前状态
terraform state list # 列出所有资源
terraform state show # 显示资源详情
terraform state rm # 从状态中移除资源
terraform import # 导入现有资源
terraform output # 显示输出变量
terraform workspace list # 列出所有工作空间
terraform workspace new # 创建工作空间
terraform workspace select # 切换工作空间
terraform force-unlock # 强制解锁状态
# Ansible常用命令
ansible --version # 查看版本
ansible all -m ping # 测试连接
ansible all -m setup # 收集主机信息
ansible all -m shell -a"uptime" # 执行命令
ansible-playbook playbook.yml # 运行Playbook
ansible-playbook playbook.yml --check # 检查模式(不实际执行)
ansible-playbook playbook.yml --diff # 显示文件差异
ansible-playbook playbook.yml -vvv # 详细输出
ansible-playbook playbook.yml --start-at-task="task name"# 从指定任务开始
ansible-playbook playbook.yml --tags"tag1,tag2" # 只执行指定标签
ansible-playbook playbook.yml --skip-tags"tag1" # 跳过指定标签
ansible-vault create file.yml # 创建加密文件
ansible-vault edit file.yml # 编辑加密文件
ansible-vault encrypt file.yml # 加密文件
ansible-vault decrypt file.yml # 解密文件
ansible-galaxy init role_name # 创建Role结构
ansible-galaxy install role_name # 安装Role
ansible-doc -l # 列出所有模块
ansible-doc module_name # 查看模块文档
B. 配置参数详解
Terraform核心参数:
参数
类型
说明
required_version
string
Terraform版本要求
required_providers
map
Provider版本要求
backend
block
远程后端配置
variable
block
输入变量定义
output
block
输出变量定义
locals
block
本地变量定义
resource
block
资源定义
data
block
数据源定义
module
block
模块调用
depends_on
list
显式依赖声明
count
number
资源实例数量
for_each
map/set
资源实例映射
lifecycle
block
生命周期管理
Ansible核心参数:
参数
类型
说明
hosts
string
目标主机组
become
boolean
是否提权
become_user
string
提权用户
vars
map
变量定义
vars_files
list
变量文件列表
tasks
list
任务列表
handlers
list
处理器列表
roles
list
Role列表
tags
list
任务标签
when
expression
条件执行
loop
list
循环执行
register
string
注册变量
notify
list
触发处理器
C. 术语表
术语
英文
解释
基础设施即代码
Infrastructure as Code (IaC)
使用代码定义和管理基础设施的实践
声明式
Declarative
描述期望的最终状态,而非执行步骤
幂等性
Idempotency
多次执行相同操作产生相同结果的特性
状态文件
State File
Terraform用于跟踪资源状态的文件
状态锁
State Lock
防止并发修改状态文件的机制
Provider
Provider
Terraform中连接云平台或服务的插件
Module
Module
Terraform中可复用的配置单元
Workspace
Workspace
Terraform中管理多环境的机制
Playbook
Playbook
Ansible中定义任务的YAML文件
Role
Role
Ansible中可复用的任务集合
Inventory
Inventory
Ansible中定义主机列表的文件
Handler
Handler
Ansible中响应通知的特殊任务
Fact
Fact
Ansible收集的主机信息
Vault
Vault
Ansible中加密敏感信息的工具
动态Inventory
Dynamic Inventory
从外部数据源动态获取主机列表
远程后端
Remote Backend
Terraform状态文件的远程存储
资源漂移
Resource Drift
实际资源状态与代码定义不一致
金丝雀部署
Canary Deployment
逐步将新版本部署到生产环境的策略
蓝绿部署
Blue-Green Deployment
通过切换环境实现零停机部署的策略
不可变基础设施
Immutable Infrastructure
不修改现有资源,而是替换为新资源的实践
-
服务器
+关注
关注
14文章
10344浏览量
91737 -
容器
+关注
关注
0文章
535浏览量
23024 -
虚拟机
+关注
关注
1文章
973浏览量
30685
原文标题:Terraform + Ansible混合编排:大规模异构环境自动化运维体系构建
文章出处:【微信号:magedu-Linux,微信公众号:马哥Linux运维】欢迎添加关注!文章转载请注明出处。
发布评论请先 登录
锐捷助互联网数据中心网络自动化、可视化运维
配电自动化实用化运维指标研究
厉害了!山东电力运维自动化平台正式投运
Ansible企业级自动化运维探索的详细资料说明
城域网自动化运维实现的关键点、难点和解决方案研究
城域网是什么,其生命周期和自动化运维应用有哪些特点
使用Python脚本实现自动化运维任务
使用Ansible实现大规模集群自动化部署
Python脚本实现运维工作自动化案例
容器化NPB + Ansible:自动化运维方案
大规模异构环境自动化运维体系构建方案
评论