Part#4: NGFW Architecture and Life of Packet

Palo Alto Networks, Fortinet, Checkpoint..

Jul 19, 2023

This article is Part#4 of a series to cover the high-level architecture and packet path of the top three hardware NGFW vendors namely Palo Alto Networks, Fortinet and Checkpoint. The links to earlier editions Part#1, Part#2 & Part#3 are below.

Software Architecture and packet processing path are the foundational aspects of the design of NGFW. The architecture needs to stand the test of time with regard to the ever-growing need for more features, performance, reliability, scalability and stability.

PAN’s Single Pass Parallel Processing(SP3):

Palo Alto Networks Next-Generation Firewall’s main strength is its Single Pass Parallel Processing (SP3) Architecture, which comprises two key components:

Single Pass Software
Parallel Processing Hardware

Single Pass Software

Single Pass Software processes the packet to perform functions like networking, user identification (User-ID), policy lookup, traffic classification with application identification (App-ID), decoding, signature matching for identifying threats and contents, which are all performed once per packet as shown in the illustration below:

This processing of a packet in one go or single pass enormously reduces the processing overhead, other vendor firewalls using a different type of architecture produce a significantly higher overhead when processing packets traversing the firewall. It’s been observed that the Unified Threat Management (UTM), which processes the traffic using multi-pass architecture, results in process overhead, latency introduction and throughput degradation.

Stream-Based Signature Engine

Single Pass Software scans the contents based on the same stream and it uses uniform signature matching patterns to detect and block threats. By adopting this methodology it negates the use of separate scan engines and signature sets, which results in low latency and high throughput. The use of a stream-based engine replaces several components commonly used in other solutions: a file proxy for data, virus, and spyware; a signature engine for vulnerability exploits; and an HTTP decoder for URL filtering. Using one common engine offers two key benefits:

Unlike file proxies that need to download the entire file before they can scan the traffic, a stream-based engine scans traffic in real time, reassembling packets only as needed and only in very small amounts.
Unlike traditional approaches, all traffic can be scanned with a single engine, instead of multiple scanning engines.

Parallel Processing Hardware

Parallel Processing hardware ensures function-specific processing is done in parallel at the hardware level which, in combination with the dedicated Data plane and Control plane, produces stunning performance results. By separating the Data plane and Control plane, ensures heavy utilization of either plane will not impact the overall performance of the Platform. At the same time, this means there is no dependency on either plane as each has its own CPU and RAM as illustrated in the diagram below:

Below is the PA-400 series hardware architecture depicting the resource separation between data-plane and control plane.

LIFE OF PACKET

INGRESS STAGE:

The ingress stage receives packets from the network interface, parses those packets, and then determines whether a given packet is subject to further inspection. If the packet is subject to further inspection, the firewall continues with a session lookup and the packet enters the security processing stage. Otherwise, the firewall forwards the packet to the egress stage.

Packet Parsing: Packet parsing starts with the Ethernet (Layer-2) followed by IPv4/IPv6(Layer-3), TCP/UDP(Layer-4) headers validated and packet is dropped for any protocol violations. The ingress port, 802.1q tag, and destination MAC address are used as keys to lookup the ingress logical interface. If the interface is not found, the packet is discarded.
Tunnel Decapsulation: After parsing the packet, if the firewall determines that it matches a tunnel, i.e. IPSec, SSL-VPN with SSL transport, then it performs the following sequence:
- The firewall decapsulates the packet first and discards it if errors exist.
- The tunnel interface associated with the tunnel is assigned to the packet as its new ingress interface and then the packet is fed back through the parsing process, starting with the packet header defined by the tunnel type.
IP Defragmentation: The firewall parses IP fragments, reassembles using the defragmentation process, and then feeds the packet back to the parser starting with the IP header. At this stage, a fragment may be discarded due to tear-drop attack (overlapping fragments), fragmentation errors, or if the firewall hits system limits on buffered fragments (hits the max packet threshold).

FIREWALL SESSION LOOKUP:

If the packet is subject to firewall inspection, it performs a flow lookup on the packet. A firewall session consists of two unidirectional flows, each uniquely identified. the firewall identifies the flow using a 6-tuple key( Src IP, Dest IP, Src Port, Dst Port, Protocol, Zone). The firewall stores active flows in the flow lookup table. When a packet is determined to be eligible for firewall inspection, the firewall extracts the 6-tuple flow key from the packet and then performs a flow lookup to match the packet with an existing flow.

Zone Protection Checks: After the packet arrives on a firewall interface, the ingress interface information is used to determine the ingress zone. If any zone protection profiles exist for that zone, the packet is subject to evaluation based on the profile configuration.
Tcp state check: If the first packet in a session is a TCP packet and it does not have the SYN bit set, the firewall discards it (default). If SYN flood settings are configured in the zone protection profile and action is set to SYN Cookies, then TCP SYN cookie is triggered if the number of SYN matches the activate threshold.
Forwarding: This stage determines the packet-forwarding path. Packet forwarding depends on the configuration of the interface .

NAT Policy Look-up: This is applicable only in Layer-3 or Virtual Wire mode. At this stage, the ingress and egress zone information is available. The firewall evaluates NAT rules for the original packet.
- For destination NAT, the firewall performs a second route lookup for the translated address to determine the egress interface/zone.
- For source NAT, the firewall evaluates the NAT rule for source IP allocation. If the allocation check fails, the firewall discards the packet.
User-ID: The firewall uses the IP address of the packet to query the User-IP mapping table (maintained per VSYS) . The corresponding user information is fetched. The firewall next takes this user information to query the user-group mapping table and fetches the group mapping associated with this user. If user information is not available at this point and a captive portal policy is setup, the firewall will attempt to find out the user information via captive portal authentication.
DoS Protection Policy Lookup: Next, the firewall checks the DoS (Denial of Service) protection policy for traffic thresholds based on the DoS protection profile. If the DoS protection policy action is set to “Protect”, the firewall checks the specified thresholds and if there is a match (DoS attack detected), it discards the packet.
Security Policy Lookup: At this stage, the ingress and egress zone information is available. The firewall uses application ANY to perform the lookup and check for a rule match. In case of a rule match, if the policy action is set to ‘deny’, the firewall drops the packet. The firewall denies the traffic if there is no security rule match. The firewall permits intra-zone traffic by default. You can modify this default behavior for intra-zone and inter-zone traffic from the security policies rulebase.
Session Allocation: The firewall allocates a new session entry from the free pool after all of the above steps are successfully completed.

FIREWALL SESSION FAST PATH:

A packet that matches an existing session will enter the fast path. This stage starts with Layer-2 to Layer-4 firewall processing:

If the session is in discard state, then the firewall discards the packet. The firewall can mark a session as being in the discard state due to a policy action change to deny, or threat detection .
If the session is active, refresh session timeout .
If the packet is a TCP FIN/RST, the session TCP half closed timer is started if this is the first FIN packet received (half closed session) or the TCP Time Wait timer is started if this is the second FIN packet or RST packet. The session is closed as soon as either of these timers expire.
If NAT is applicable, translate the L3/L4 header as applicable.

If an application uses TCP as the transport, the firewall processes it by the TCP reassembly module before it sends the data stream into the security-processing module. The TCP reassembly module will also perform window check, buffer out-of-order data while skipping TCP retransmission. The firewall drops the packets if there is a reassembly error or if it receives too many out-of-order fragments, resulting in the reassembly buffers filling up.

Security Processing: A packet matching an existing session is subject to further processing. If the firewall does not detect the session application, it performs an App-ID lookup. If App-ID lookup is non-conclusive, the content inspection module runs known protocol decoder checks and heuristics to help identify the application.

Captive Portal: If the user information was not available for the source IP address extracted from the packet, and the packet is destined to TCP/80, the firewall performs a captive portal rule lookup to see if the packet is subject to captive portal authentication. If captive portal is applicable, the packet is redirected to the captive portal daemon.

APPLICATION IDENTIFICATION (App-ID):

The firewall first performs an application-override policy lookup to see if there is a rule match. If there is, the application is known and content inspection is skipped for this session. If there is no application-override rule, then application signatures are used to identify the application. The firewall uses protocol decoding in the content inspection stage to determine if an application changes from one application to another.

After the firewall identifies the session application, access control, content inspection, traffic management and logging will be setup as configured.

Security policy lookup: The identified application as well as IP/port/protocol/zone/user/URL category in the session is used as key to find rule match.
If the security policy has logging enabled at session start, the firewall generates a traffic log, each time the App-ID changes throughout the life of the session.
If security policy action is set to allow and it has associated profile and/or application is subject to content inspection, then it passes all content through Content-ID .
If security policy action is set to allow, the firewall performs a QoS policy lookup and assigns a QoS class based on the matching policy.
If security policy action is set to allow and the application is SSL or SSH, perform a decryption policy lookup and set up proxy contexts if there is a matching decryption rule .

CONTENT INSPECTION:

The firewall performs content Inspection, if applicable, where protocol decoders’ decode the flow and the firewall parses and identifies known tunneling applications. If the identified application changes due to this, the firewall consults the security policies once again to determine if the session should be permitted to continue. If the application does not change, the firewall inspects the content as per all the security profiles attached to the original matching rule. If it results in threat detection, then the corresponding security profile action is taken.

EGRESS:

The firewall identifies a forwarding domain for the packet, based on the forwarding setup (discussed earlier). The firewall performs QoS shaping as applicable in the egress process. Also, based on the MTU of the egress interface and the fragment bit settings on the packet, the firewall carries out fragmentation if needed.If the egress interface is a tunnel interface, then IPSec/SSL-VPN tunnel encryption is performed and packet forwarding is reevaluated.

Finally the packet is transmitted out of the physical egress interface

Fortinet’s Parallel Path Processing (PPP):

A FortiGate inspects network traffic from the IP layer up through the application layer of the TCP/IP stack. The FortiGate uses security policies to do this inspection. Inspection steps depend on the FortiGate hardware such as whether the FortiGate has network processors like the NP6 and content processors like the CP8 and CP9. It also depends on the UTM/NGFW inspection mode (flow-based or proxy-based).

The FortiGate performs the following types of security inspection:

Kernel-based stateful inspection, that provides individual packet-based security within a basic session state.
Flow-based inspection, that takes a snapshot of content packets and uses pattern matching to identify security threats in the content.
Proxy-based inspection, that reconstructs content passing through the FortiGate and inspects the content for security threats.

Parallel Path Processing (PPP) uses the firewall policy configuration to choose from a group of parallel options to determine the optimal path for processing a packet. Most FortiOS features are applied through Firewall policies and the features applied determine the path a packet takes. Using firewall policies you can impose UTM/NGFW processing on content traffic that may contain security threats (such as HTTP, email and so on). Many UTM/NGFW processes are offloaded and accelerated by CP8 or CP9 processors. Using the policy configuration you can apply a range of protection from basic IPS attack protection that looks for network-based attacks to full scale advanced threat management (ATM),application control, antivirus, DLP and so on.

Diagram 1: Packet flow in FortiGate without network processor offloading

Ingress: All packets accepted by a FortiGate pass through a network interface and are processed by the TCP/IP stack. Then if DoS policies have been configured the packet must pass through these as well as automatic IP integrity header checking. Incoming IPsec packets that match configured IPsec tunnels are decrypted after header checking is done.

Admission control: checks to make sure the packet is not from a source or headed to a destination on the quarantine list. If configured admission control then imposes Forti Telemetry protection that requires a device to have FortiClient installed before allowing packets from it. Admission control can also impose captive portal authentication on ingress traffic.

Kernel: The following is performed in the kernel

Destination NAT checks the NAT table and determines if the destination IP address for incoming traffic must be changed using DNAT
Routing uses the routing table to determine the interface to be used by the packet as it leaves the FortiGate. Routing also distinguishes between local traffic and forwarded traffic
Stateful Inspection looks at the first packet of a session and looks in the policy table to make a security decision about the entire session. Stateful inspection looks at packet TCP SYN and FIN flags to identity the start and end of a session. It checks packet payload and sequence numbers to verify it as a valid session and that the data is not corrupted or poorly formed.
Session Helpers to analyze the data in the packet bodies of some protocols and adjust the firewall to allow those protocols to send packets through the firewall.
User authentication added to security policies is handled by the stateful inspection, which is why Firewall authentication is based on IP address.
Device identification is applied if required by the matching policy
Local SSL VPN traffic is treated like special management traffic as determined by the SSL VPN destination port. Packets are decrypted and are routed to an SSL VPN interface. Policy lookup is then used to control how packets are forwarded to their destination outside the FortiGate. SSL encryption and decryption is offloaded to and accelerated by CP8 or CP9 processors.
Local management traffic includes administrative access, routing protocol communication, central management from FortiManager, communication with the FortiGuard network.

UTM/NGFW: UTM/NGFW processing depends on the inspection mode of the security policy, Flow-based (single pass architecture) or proxy-based.

Single Pass Flow-Based inspection identifies and blocks security threats in real time as they are identified using single-pass Direct Filter Approach (DFA) pattern matching to identify possible attacks or threats. Packets are then subject to botnet checking to make sure they are not destined for known botnet addresses

Proxy-Based inspection can apply both flow-based and proxy-based inspection. Packets initially encounter the IPS engine, which can apply single-pass flow-based IPS and Application Control. The packets are then sent to the proxy for proxy-based inspection. Proxy-based inspection can apply VoIP inspection, DLP, Email Filter (Anti-Spam), Web Filtering, Antivirus, and ICAP.

Kernel: Traffic is now in the process of exiting the FortiGate. The kernel uses the routing table to forward the packet out the correct exit interface. The kernel also checks the NAT table and determines if the source IP address for outgoing traffic must be changed using SNAT.

Egress: Before exiting the FortiGate, outgoing packets that are entering an IPsec VPN tunnel are encrypted and encapsulated. IPsec VPN encryption is offloaded to and accelerated by CP8 or CP9 processors. Traffic shaping is then imposed, if configured, followed by WAN Optimization. The packet is then processed by the TCP/IP stack and exits out the egress interface.

Network Process(NP6) Offload & Acceleration:

On FortiGates with network processors, traffic that does not pose security threats can bypass UTM/NGFW processing and can be offloaded to the NP6 processors freeing up FortiGate processing resources for other higher risk traffic. This control allows you to improve network performance without compromising security.

The first packet of a session determines if the session can be offloaded. When there is no proxy-based UTM/NGFW, an NP6 processor, can offload most the sessions.

After the first packet, subsequent packets in an offloaded session skip routing, UTM/NGFW, and kernel processors and are just forwarded out the egress interface by the NP6 processor. As well, security measures such as DoS policies, ACL, and so on are accelerated by the NP6 processor.

Checkpoint:

The diagram below in this section outlines high-level packet flow. The various stages are described in detail below. The packet flow through a Security GW:

FW/Slow Path: FW Path is implored when acceleration is not possible. In this case each packet in the connection goes through FW Kernel Inspection section and sometimes through Content Inspection block, if policy requires that.
Accelerated Path: Accelerated Path is active when a connection can be accelerated with a template through SecureXL device. In this case all individual packets within the connection will bypath both FW Kernel section and Content Inspection block.
Medium Path: Medium Path is a situation when opening and closing a connection is handled by SecureXL, while data flow needs some further inspection and hence goes through Content Inspection.

Inbound Packet

The packet is taken from the wire by the firewall driver and enters a buffer. The packet is normalized in a structure called a chain which represents both the original packet and its current state in firewall processing. Chains (normalized packets) pass through chain modules, which can choose to pass a chain to the next module, drop or hold it.

VPN Decrypt

The first of these modules checks whether the packet is part of an IPsec VPN and decrypts if necessary.

The VPN kernel module decrypts the packet.
The decrypted (original) packet is inspected.

Inbound Stateless Checks

The firewall does preliminary “stateless” checks that do not require context in order to decide whether to accept a packet or not. For instance we check that the packet is a valid packet and if the header is compliant with RFC standards.

CONNECTION SETUP

A stateful firewall tracks the state of network connections in memory to identify other packets belonging to the same connection and to dynamically open connections that belong to the same session.The state tables include information about current connections, hosts, users and other information and are implemented as dynamic hash tables in kernel memory. The connection table includes a 5-tuple key to lookup the connection. Before there is a connection table entry, the firewall must first check the connection against the firewall security policy to see if the connection is allowed.

SecureXL

One of two drivers will handle the first connection; SecureXL or firewall INSPECT. The SecureXL device is implemented either in software, or in hardware and minimizes the connections that are processed by the INSPECT driver. SecureXL accelerates connections on two ways.

Throughput Acceleration: The first packets of a new TCP connection require more processing when processed by the firewall module. If the connection is eligible for acceleration, after minimal security processing the packet is offloaded to the SecureXL device associated with the proper egress interface. Subsequent packets of the connection can be processed on the accelerated path and directly sent from the inbound to the outbound interface via the SecureXL device

Connection Rate Acceleration: SecureXL also improves the rate of new connections (connections per second) and the connection setup/teardown rate (sessions per second). To accelerate the rate of new connections, connections that do not match a specified 5 tuple are still processed by SecureXL. For example, if the source port is masked and only the other 4 tuple attributes require a match. When a connection is processed on the accelerated path, SecureXL creates a template of that connection that does not include the source port tuple. A new connection that matches the other 4 tuples is processed on the accelerated path because it matches the template. The firewall module does not inspect the new connection, increasing firewall connection rates.

SecureXL and the firewall module keep their own state tables and communicate updates to each other.

Connection offload - Firewall kernel passes the relevant information about the connection from firewall connections table to SecureXL connections table.
Connection notification - SecureXL passes the relevant information about accelerated connections that match an accept template.

In addition to accept templates the SecureXL device is also able to apply drop templates which are derived from security rules where the action is drop.

CoreXL:

CoreXL enhances performance by enabling multi-core to concurrently perform multiple tasks. CoreXL provides near linear scalability of performance.Traffic is distributed by one or more Secure Network Distributor (SND) working with a SecureXL instance to the firewall instances running on the other cores. It is achieved

Dynamic Dispatcher: The dynamic dispatcher monitors not only the current load of each CPU core but also the anticipated future load based on traffic queued for processing. When a stream or connection is dispatched to a particular core for processing it builds a table to ensure that subsequent packets within the same connection retain affinity to the same core.

Multi-queue: CPU cores are affined to an interface queue. Each affined buffer can “interrupt” its own CPU core allowing high volumes of inbound packets to be shared across multiple dispatchers.

Priority Queues: The Priority Queues functionality prioritizes control connections over data connections based on priority.

PRE-CONTENT SECURITY INSPECTION:

Stream Assembly: The Streaming Engine processes the individual packet chains, creates an ordered packet stream and directly performs a number of security functions on the stream. The streaming engine passes assembled stream to the protocol parsers

Protocol Parsers: Protocol parser instances register with the streaming engine in order to receive ordered streams of data. They test for conformance to RFCs and look for anomalies. Another job of the parsers is to normalize content. The parsers main purpose is to extract ‘contexts’ from the streams to prepare for the next level of inspection.

CONTENT SECURITY INSPECTION:

Context Management Infrastructure (CMI): It connects parsers with pattern matchers. CMI determines which protections should be activated on every context discovered by a protocol parser. If policy dictates that no protections should run, then the relevant parsers on this traffic are bypassed in order to improve performance and reduce potential false positives.

Pattern Matchers: Security modules each have a distinct function and they register with the CMI to receive particular context types. The CMI Loader maintains a register table of all security modules and the contexts they want to inspect. During policy installation, the CMI collects signatures from multiple sources (e.g. IPS and Application Control) and compiles them together into Pattern Matchers (PM).

Protections: Protections are composed of signatures that identify malicious activity. Compound Signature Identification (CSI) - Signatures on multiple parts of a packet, multiple parts of the protocols such as URL and an HTTP header, multiple parts of a connection or multiple connections. CSI constructs complex signatures that are triggered only if a certain logical condition over multiple contexts is matched.

FILE HANDLING:

In some situations we need to inspect an entire file for malicious behavior. In these cases the file is sent directly to Threat Emulation, Threat Extraction and Antivirus for processing. SandBlast Threat Extraction technology eliminates threats by removing exploitable content and reconstructing documents using known safe elements. Safe content is delivered to the user. Users are allowed access to original files after Threat Emulation completes its analysis.

OUTBOUND PACKET:

On outbound packets we apply source NAT replacing the source IP address and or port to a translated IP address and or port. Encrypting a packet

The outgoing packet is inspected by the firewall.
The VPN kernel module encrypts the packet.

Reference and credits:

Official websites of PAN, Fortinet and Checkpoint for datasheets and google.

TechStack

Discussion about this post