Building a Deployment Plan for an FRS Replica Set

This section discusses how to plan a single replica set. The aim is to ensure that the rate of change in the data set can be accommodated by the available communication bandwidth and schedule.

Planning for bandwidth availability

The first aspect to consider is the bandwidth availability between nodes in the replication set:

If you are using schedules and hubs with many outgoing connections, it is a best practice to stagger the schedules so that the hub computer does not try to deliver a backlog of replication traffic to all partners at once. This reduces the amount of simultaneous disk I/O that the hub server needs to perform and will help to reduce timeouts from partners. The extent to which the schedules should be staggered should be based on number of connections to the hub, and the CPU and disk performance of the hub server.

Note that scheduling is the only mechanism that FRS provides for throttling replication traffic.

Planning for data size and change rate

When a file is modified, FRS sends a complete copy of the resultant file to the computer’s replication partners.

If you have an existing data set that you want to replicate, use the following procedure to estimate how much replication traffic will be generated in a given time period:

  1. Use ATTRIB –A <path> /S
  2. Use the data set at <path> as normal
  3. After the chosen interval (for example daily), use XCOPY /A /S <path> <temp_dir>

    Note that both <path> and <temp_dir> should be on NTFS volumes.

  4. Ensure the files in <temp_dir> are compressed by executing COMPACT /C /S:<temp_dir>. When this command completes, the summary report indicates how many bytes are being used to store the files, for example:
    405 files within 11 directories were compressed.
    19,862,282 total bytes of data are stored in 10,677,439 bytes
    The compression ratio is 1.9 to 1.
    
    

Note that replication schedule also plays an important role here. If there are multiple versions of a file in the FRS replication queue from a computer, then FRS will only send the most recent version, not all of the intermediate versions. For this reason, a schedule coalesces file changes and reduces bandwidth usage at the expense of a short-term backlog.

Expected propagation requirements

If the topology has multiple levels from the master to its furthest node (based upon the number of hops in the replication topology), and the connections have schedules defined, then the system designer should consider the likely propagation delay for the file to reach that final target.

Planning the number of replica sets per server

When using FRS to replicate DFS links as replica sets, there is no fixed limit to the number of replica sets that a single file server can be involved in; however, we recommend that you host no more than 150 different replica sets on a single server to ensure replication performance. The optimal number of replica sets for servers in your organization depends on the CPU, memory, disk input/output (I/O) throughput, and the amount of data changed.

Configuring filters to exclude files and folders from replication

You can use the Distributed File System snap-in to create filters that exclude subfolders or files with certain extensions from replication. By default, the following files are excluded from FRS replication:

Expected growth of replicated data

You need to know whether you plan to replicate larger and larger amounts of data over time so that you can ensure that your topology, schedule, and bandwidth can handle the additional data.

Expected increase in the number of replica members

If you plan to deploy a small number of servers at first and then deploy additional servers over time, you need to ensure that your topology and bandwidth can handle the new servers.