TCP checksum

计算 TCP 首部中使用的校验和

  • TCP checksum is a 16-bit field in TCP header used for error detection
  • Same as IP checksum,
    TCP checksum is the 16-bit one’s complement of the one’s
    complement sum of all 16-bit words in the computation data.
  • Checksum 图:
  • TCP checksum computing includes:
    • UDP 数据报和 TCP 段中使用的校验和的计算都包含一个 12 字节长的伪首部.
      12 bytes TCP 伪首部 (12 bytes TCP pseudo header,
      from the IP header and computed), includes:
    • ipHdr srcIP 4 bytes
    • ipHdr dstIP 4 bytes
    • 1 reserved byte: 0x00
    • 1 byte protocol: from ipHdr: 0x06 for TCP
    • 2 byte computed TCP length
    • TCP “Pseudo Header” For Checksum Calculation 图:
    • Original TCP segment(length: is above TCP length), might+ padding,
      includes:
    • TCP header
    • TCP data, includes:
      • TCP data
      • Padded as needed with zero bytes at the end to make a multiple
        of two bytes

TCP “Pseudo Header” For Checksum Calculation

Field Name Bytes Description
Source Address 4 The 32-bit IP address of the originator of the datagram, taken from the IP header
Destination Address 4 The 32-bit IP address of the intended recipient of the datagram, also from the IP header
Reserved 1 8 bits of zeroes
Protocol 1 The Protocol field from the IP header. This indicates what higher-layer protocol is carried in the IP datagram. Of course, we already know what this protocol is, it’s TCP! So, this field will normally have the value 6
TCP Length 2 The length of the TCP segment, including both header and data. Note that this is not a specific field in the TCP header; it is computed
/**
 * 计算 TCP 首部中使用的校验和
 * @param cleanChecksum true to cleanup original checksum before compute
 * @param ipHdr IP首部
 * @return in network byte order
 */
uint16_t computeChecksum(
    bool const cleanChecksum, IpHdr const* const ipHdr) const
{
    // get computed TCP length, also is original TCP segment length
    size_t const protocolSize = ipHdr->computeProtocolSize();
    size_t const n = 12 + protocolSize;
    uint8_t data[n];
    // ipHdr srcIP
    ::memcpy(data, &(ipHdr->sourceIpAddress), 4);
    // ipHdr dstIP
    ::memcpy(data + 4, &(ipHdr->destinationIpAddress), 4);
    // reserved
    data[8] = 0x00;
    // protocol
    data[9] = ipHdr->protocol;
    // computed TCP length
    data[10] = (protocolSize & 0xff00) >> 8;
    data[11] = protocolSize & 0xff;
    // original TCP segment
    tcphdr* const tcpHdr = reinterpret_cast<tcphdr*>(data + 12);
    ::memcpy(tcpHdr, this->tcpHdr, protocolSize);
    // cleanup original checksum when need
    if (cleanChecksum) {
        tcpHdr->check = 0;
    }
    // compute
    return ethpacket::ComputeChecksum(
        reinterpret_cast<uint16_t const*>(data), n);
}

Ethernet packet checksum

计算以太网包首部中使用的校验和

算法: first summing all numbers(every 16-bit, might with padding)
and adding the carry (or carries) to the result,
then compute ones' complement:
– 发送端或接收端计算发送端的校验和时, 需要首先把校验和字段设置为 0
– 然后累积每个 16-bit
– 然后将最终结果"折叠"成 16 bits
– 然后 ones' complement (算法: 所有按位取反)
– 最后只要低 16 bits, 最后结果是网络字节顺序

/**
 * 计算以太网包首部中使用的校验和
 * @param data data to compute, in network byte order
 * @param bytes data size in bytes
 * @return in network byte order
 */
uint16_t ComputeChecksum(uint16_t const* const data, size_t const bytes)
{
    // initialize sum to zero
    unsigned long sum = 0;
    // n to keep current left bytes
    size_t n;
    // data index for each 16-bit
    long dataIndex;
    // accumulate sum
    for (n = bytes, dataIndex = -1; n > 1; n -= 2) {
        sum += data[++dataIndex];
    }
    /*
     * if any bytes left, pad the bytes with 0x00 and add
     * 最后一个 16-bit, 网络字节序:
     * 最后一个字节(位于最后一个 16-bit高位): MSB, then 0x00,
     * 而 MSB 在低地址, 因此计算使用最后一个 16-bit 是 0x00 then 最后一个字节:
     * - 即最后一个字节,
     * - => ((data[++dataIndex] >> 8) & 0xff) ... but data[...] bad
     * - => ((data[++dataIndex]) & htons(0xff00) ... but data[...] bad
     * - => uint8_t const msb = (reinterpret_cast<uint8_t const* const>(data))[
     *          long(bytes) - 1];
     */
    if (n > 0) {
        uint8_t const msb = (reinterpret_cast<uint8_t const* const>(data))[
            long(bytes) - 1];
        sum += msb;
    }
    // fold sum to 16 bits: add carrier to result
    while (sum >> 16) {
        sum = (sum >> 16) + (sum & 0xffff);
    }
    /*
     * ones' complement:
     * The ones' complement of a binary number is defined as the value
     * obtained by inverting all the bits in the binary representation of the
     * number (swapping 0s for 1s and vice versa)
     */
    return uint16_t(~sum);
}

Parallel computing

Parallel computing

#

OpenMP

compile

  • 编译选项
-fopenmp
  • or cmake
find_package(OpenMP)
if(OPENMP_FOUND)
  message(STATUS "OPENMP FOUND")
  set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
  set(CMAKE_EXE_LINKER_FLAGS
    "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
  add_definitions(-DHAVE_OPENMP=1)
else()
  message(STATUS "OPENMP NOT FOUND")
  add_definitions(-DHAVE_OPENMP=0)
endif()
if("${USE_OPENMP}" STREQUAL "")
  if(OPENMP_FOUND)
    add_definitions(-DUSE_OPENMP=1)
    message(STATUS "use OPENMP")
  else()
    add_definitions(-DUSE_OPENMP=0)
    message(STATUS "not use OPENMP")
  endif()
else()
  if(USE_OPENMP)
    if(NOT OPENMP_FOUND)
      message(FATAL_ERROR "USE_OPENMP true but OPENMP NOT FOUND")
    endif()
    add_definitions(-DUSE_OPENMP=1)
    message(STATUS "use OPENMP")
  else()
    add_definitions(-DUSE_OPENMP=0)
    message(STATUS "not use OPENMP")
  endif()
endif()

notice

如果需要并行执行(e.g. 并行执行 for), 而不是将语句块多次执行,
如果是 for, 可以省略
for (..;..;): 要使用 parallel for;
而不是仅仅 parallel (没有 for)

examples

#if HAVE_OPENMP
/**
 * The pragma omp parallel is used to fork additional threads to carry out
 * the work enclosed in the construct in parallel.
 * The original thread will be denoted as master thread with thread ID 0.
 * Example (C program): Display parallel[%d] using multiple threads.
 * - Use flag -fopenmp to compile using GCC
 * - 使用 shared and std::atomic
 * @sa
 * - http://www.openmp.org/wp-content/uploads/openmp-examples-4.5.0.pdf
 * - https://en.wikipedia.org/wiki/OpenMP#Thread_creation
 * - openmp-4.5.pdf
 */
TEST(openmp, shared)
{
    std::atomic<int> count(0);
#   pragma omp parallel default(shared) shared(count) \
        num_threads(::omp_get_max_threads())
    ::printf("parallel[%d]\n", ++count);
    // parallel done
    int const num = ::omp_get_max_threads();
    EXPECT_EQ(num, count);
}
/// 使用 shared and std::atomic 2
TEST(openmp, shared2)
{
    std::atomic<int> count(0);
    int const num = ::omp_get_max_threads();
    ::omp_set_num_threads(num);
#   pragma omp parallel default(shared) shared(count) num_threads(num)
    ::printf("parallel[%d]\n", ++count);
    // parallel done
    EXPECT_EQ(num, count);
}
/// 整个 for 并行(或多个并行)执行
TEST(openmp, parallelFor)
{
    std::atomic<int> count(0);
    int const num = ::omp_get_max_threads();
#   pragma omp parallel for default(shared) shared(count) num_threads(num)
    for (int i = 0; i < num; ++i) { // 整个 for 并行执行
        ::printf("parallel[%d] i %d\n", ++count, i);
    }
    // parallel done
    EXPECT_EQ(8, count);
}
/// 使用 for 但是不写 for. 和上面 parallelFor 一样
TEST(openmp, parallelForOmitFor)
{
    std::atomic<int> count(0);
#   pragma omp parallel default(shared) shared(count) \
        num_threads(::omp_get_max_threads())
    {
        ::printf("parallel[%d]\n", ++count);
    }
    // parallel done
    int const num = ::omp_get_max_threads();
    EXPECT_EQ(num, count);
}
/// 整个 for 会被执行多次
TEST(openmp, multiFor)
{
    std::atomic<int> count(0);
    int i, c;
    int const num = ::omp_get_max_threads();
    ::omp_set_num_threads(num);
#   pragma omp parallel default(shared) shared(count) private(i, c) \
        num_threads(num)
    for (i = 0, c = ++count; i < c; ++i) { // 整个 for 会被执行多次
        ::printf("parallel[%d ~ %d]\n", i, c);
    }
    // parallel done
    EXPECT_EQ(num, count);
}
/// 使用 private
TEST(openmp, private)
{
    std::atomic<int> count(999);
#   pragma omp parallel default(shared) private(count) \
        num_threads(::omp_get_max_threads())
    {
        EXPECT_NE(999, count);
        ::printf("parallel[%d]\n", count = 20);
        // parallel done
    }
    EXPECT_EQ(999, count);
}
/// 使用 private 2
TEST(openmp, private2)
{
    int count = 999;
#   pragma omp parallel default(shared) private(count) \
        num_threads(::omp_get_max_threads())
    {
        EXPECT_NE(999, count);
        ::printf("parallel[%d]\n", count = 20);
        // parallel done
    }
    EXPECT_EQ(999, count);
}
#endif

use future or shared future

example

cv::Mat_<int32_t> rightDisparity(
    this->leftRect.rows, this->leftRect.cols, 0);
cv::Mat_<int32_t> leftDisparity(
    this->leftRect.rows, this->leftRect.cols, 0);
std::future<void> leftDone(std::async(
    std::launch::async,
    boost::bind(&ParallelSADStereoMatch::computeDisparity, this, _1, _2),
    true, std::ref(leftDisparity)));
std::future<void> rightDone(std::async(
    std::launch::async,
    boost::bind(&ParallelSADStereoMatch::computeDisparity, this, _1, _2),
    false, std::ref(rightDisparity)));
rightDone.get();
leftDone.get();

SIMD

SSE2

TODO


thread parallel for

example implementation

template<typename I, typename F>
void ParallelFor(
    I const& first,
    I const& last,
    F&& f,
    int const threadsNum = 1,
    int const threshold = 1000)
{
    unsigned const group = std::max(
        std::max(ptrdiff_t(1), ptrdiff_t(std::abs(threshold))),
        (last - first) / std::abs(threadsNum));
    std::vector<std::thread> threads;
    for (I it = first; it < last; it += group) {
        threads.push_back(std::thread([=, &f](){
            std::for_each(it, std::min(it + group, last), f); }));
    }
    std::for_each(threads.begin(), threads.end(), [](std::thread& t){
        t.join(); });
}