This section discusses how to plan a single replica set. The aim is to ensure that the rate of change in the data set can be accommodated by the available communication bandwidth and schedule.
The first aspect to consider is the bandwidth availability between nodes in the replication set:
If you are using schedules and hubs with many outgoing connections, it is a best practice to stagger the schedules so that the hub computer does not try to deliver a backlog of replication traffic to all partners at once. This reduces the amount of simultaneous disk I/O that the hub server needs to perform and will help to reduce timeouts from partners. The extent to which the schedules should be staggered should be based on number of connections to the hub, and the CPU and disk performance of the hub server.
Note that scheduling is the only mechanism that FRS provides for throttling replication traffic.
When a file is modified, FRS sends a complete copy of the resultant file to the computer’s replication partners.
If you have an existing data set that you want to replicate, use the following procedure to estimate how much replication traffic will be generated in a given time period:
Note that both <path> and <temp_dir> should be on NTFS volumes.
405 files within 11 directories were compressed.
19,862,282 total bytes of data are stored in 10,677,439 bytes
The compression ratio is 1.9 to 1.
Note that replication schedule also plays an important role here. If there are multiple versions of a file in the FRS replication queue from a computer, then FRS will only send the most recent version, not all of the intermediate versions. For this reason, a schedule coalesces file changes and reduces bandwidth usage at the expense of a short-term backlog.
If the topology has multiple levels from the master to its furthest node (based upon the number of hops in the replication topology), and the connections have schedules defined, then the system designer should consider the likely propagation delay for the file to reach that final target.
When using FRS to replicate DFS links as replica sets, there is no fixed limit to the number of replica sets that a single file server can be involved in; however, we recommend that you host no more than 150 different replica sets on a single server to ensure replication performance. The optimal number of replica sets for servers in your organization depends on the CPU, memory, disk input/output (I/O) throughput, and the amount of data changed.
You can use the Distributed File System snap-in to create filters that exclude subfolders or files with certain extensions from replication. By default, the following files are excluded from FRS replication:
You need to know whether you plan to replicate larger and larger amounts of data over time so that you can ensure that your topology, schedule, and bandwidth can handle the additional data.
If you plan to deploy a small number of servers at first and then deploy additional servers over time, you need to ensure that your topology and bandwidth can handle the new servers.