If you have ever written Verilog and wondered what happens between always @(posedge clk) and the bitstream that programs an FPGA, the answer is a synthesis tool. For most of computing history that tool was proprietary, opaque, and expensive. Yosys changed that. It is the first complete, freely available, production-quality RTL synthesis framework — and understanding how it works gives you a mental model for every synthesis tool you will ever use.
This post is aimed at someone new to FPGAs who knows how to write Verilog but has never thought carefully about what a synthesizer actually does. No background in EDA is assumed, but the technical machinery is explained in full.
The Synthesis Problem
Start with the question synthesis has to answer: given a behavioural description of a circuit — “this register should increment by one every clock cycle when en is high and reset to zero when rst is high” — produce a netlist: a graph of primitive cells (flip-flops, lookup tables, NAND gates, etc.) and the wires connecting them, which implements exactly that behaviour using the gates available in a specific target library.
This is harder than it sounds. The designer writes:
module counter (input clk, rst, en, output reg [2:0] count);
always @(posedge clk)
if (rst) count <= 3'd0;
else if (en) count <= count + 3'd1;
endmoduleThe target is, say, a cell library with NAND, NOR, NOT, and a positive-edge D flip-flop. The synthesizer must:
- Parse the Verilog and build an abstract representation of what the circuit means
- Elaborate the design — resolve parameters, flatten hierarchy, canonicalize expressions
- Optimize — eliminate redundant logic, recognize patterns (muxes, adders, FSMs)
- Map to technology — replace generic logic cells with library-specific cells
- Verify that the result is functionally equivalent to the input
None of these steps are trivial. Optimization step 3 alone is a collection of NP-hard problems (Boolean satisfiability, graph isomorphism, covering) that the tool approximates with heuristics. The mapping in step 4 involves solving a covering problem over a tree of logic gates, minimizing area and/or timing subject to a cost function from the technology library.
Yosys: Architecture
Yosys is organized as a framework rather than a monolithic program. The core provides:
- A Verilog (and RTLIL) frontend
- An internal representation called RTLIL
- A pass system where each transformation is a loadable plugin
- An ABC subprocess for combinational optimization and technology mapping
- Backends for output formats (Verilog, BLIF, EDIF, AIGER, SPICE, …)
The command interface, whether you use the interactive shell or a script file (.ys), is just a sequencer over these passes. synth, the most common entry point, is itself a pass that calls other passes in a well-tuned order.
RTLIL: The Internal Representation
Every Yosys pass reads and writes RTLIL — Register Transfer Level Intermediate Language. It is worth understanding this representation because it is what every pass sees.
An RTLIL design is a hierarchy of Module objects. Each module contains:
- Wires: typed (integer width), with
\prefix for user-defined names,$for generated names - Cells: instances of other modules or primitive cell types (
$dff,$mux,$add,$alu, …) - Connections: explicit wire-to-port assignments
The primitive cell types in RTLIL are generic — $dff is a positive-edge D flip-flop regardless of what library you target. Technology mapping is the process of replacing these generic cells with library-specific ones like FDRE (Xilinx) or DFF (a custom CMOS library).
After reading counter.v, the RTLIL contains something like:
module \counter
wire input 1 \clk
wire input 2 \rst
wire input 3 \en
wire output 4 width 3 \count
wire width 3 $procmux$3_Y
cell $dff $procdff$8
parameter \CLK_POLARITY 1'1
parameter \WIDTH 3
connect \CLK \clk
connect \D $procmux$3_Y
connect \Q \count
end
cell $add $add$counter.v:10$2
parameter \A_WIDTH 3
parameter \B_WIDTH 3
parameter \Y_WIDTH 3
connect \A \count
connect \B 3'001
connect \Y ...
end
...
end
This isn’t Verilog. There are no always blocks, no procedural assignments. The sequential process (always @(posedge clk)) has been reduced to a DFF cell with explicit data and clock connections. The conditional (if (rst) ... else if (en) ...) has become a chain of $mux cells. This transformation is the job of the PROC pass group.
The Pass System
A Yosys pass is a C++ class that overrides void execute(vector<string> args, Design *design). Passes are registered by name and invoked from scripts. The full synth flow for a generic target looks like:
hierarchy ; resolve module instantiation
proc ; eliminate always blocks → DFF + MUX cells
opt ; constant folding, dead code elimination
fsm ; detect and optimize finite state machines
opt ; second round
memory ; transform memory arrays to cell networks
opt ; third round
techmap ; map generic cells to technology primitives
opt -fast ; cleanup
abc ; combinational optimization and technology mapping
opt_clean ; remove dangling wires and cells
Each pass is individually documented and can be run standalone. opt_expr does constant folding. opt_merge deduplicates structurally identical cells (and uses structural hashing so it scales to large designs). opt_muxtree prunes dead branches in mux trees based on control signal analysis — it can eliminate entire subtrees of logic that could never be selected. opt_dff recognizes when a flip-flop’s reset value is the same as its next-state function under some condition, and collapses the reset into an enable signal or eliminates it entirely.
The FSM pass is particularly interesting. fsm_detect looks for the structural signature of a finite state machine — a DFF cluster where all outputs are functions of the state register and inputs. fsm_extract lifts it into a symbolic FSM representation. fsm_opt then applies FSM-specific optimizations (unreachable state elimination, state merging) that would be invisible to generic gate-level optimization. fsm_recode can re-encode the state register with a different encoding (binary, one-hot, gray code) before fsm_map converts it back to gates.
Optimization: What opt_dff Actually Did
The counter example reveals a non-trivial optimization. After PROC, the circuit has:
- A 3-bit
$dffcell - A
$muxfor the reset condition (output = 0 when rst, else count+1) - A second
$muxfor the enable condition (output = count+1 when en, else count) - A
$addfor the increment
opt_dff can observe that when rst=1, the DFF output becomes 3'000, which is a constant. It converts the bare $dff into a $sdffe — a DFF with synchronous reset and clock enable. The reset value is absorbed into the cell type, and the two muxes disappear. The cell count drops from ~7 cells to 2: one $sdffe and one $add.
This matters for FPGA targets because FDRE (Xilinx) and its equivalents have native synchronous reset and clock enable inputs. If the synthesizer fails to recognize this pattern, it synthesizes the muxes into LUTs that consume the enable and reset control logic inputs of the flip-flop’s CE and R pins, wasting routing resources and degrading timing. Recognizing the pattern lets the mapper drive those dedicated inputs directly.
ABC: The Combinational Oracle
After generic optimization, Yosys hands the combinational logic to ABC (A System for Sequential Synthesis and Verification), a tool developed at UC Berkeley. This is the step that does the heavy lifting of technology mapping.
ABC works with And-Inverter Graphs (AIGs) — a canonical representation of Boolean functions where every node is either an AND gate or a NOT (inverter), and any Boolean function can be expressed with these two. AIGs are attractive because they are compact, have efficient algorithms for functional equivalence checking, and are easy to optimize.
Yosys serializes the combinational logic to a BLIF (Berkeley Logic Interchange Format) file, ABC reads it, applies its optimization script, writes an output BLIF, and Yosys reads the result back. The handoff is temporary-file based, which is why you see <abc-temp-dir> in the logs.
A typical ABC script in Yosys’s abc -liberty invocation looks like:
read_blif input.blif
read_lib -w stdcells.lib
strash ; build AIG
&get -n; &fraig -x; &put ; functionally-reduced AIG
scorr ; sequential correlation removal
dc2 ; don't-care-based two-level optimization
dretime ; retiming (if sequential mode)
strash
&get -n; &dch -f; &nf; &put ; choice-based mapping
write_blif output.blif
The &dch step builds a “choice network” — a graph where each node has multiple functionally equivalent implementations, and the downstream mapper can pick the cheapest one given the timing context. &nf (node flow) does the actual technology-dependent mapping by selecting, for each cut of the AIG, the cheapest library cell that implements the function of that cut, subject to area and delay constraints.
For the counter with a 4-cell liberty library (NAND, NOR, NOT, DFF), ABC produces:
- 6 NAND + 6 NOR + 2 NOT cells for the combinational logic
- Balanced against delay: the critical path goes through 4 gate levels
This is the minimum achievable with that cell set for a 3-bit binary counter with synchronous enable and reset. A skilled logic designer reasoning from first principles would arrive at the same answer.
Technology Libraries: Liberty Format
The .lib file (Liberty format) is the contract between the synthesizer and the foundry or FPGA vendor. It specifies:
cell (NAND) {
area : 4;
pin (A) { direction : input; }
pin (B) { direction : input; }
pin (Z) {
direction : output;
function : "!(A*B)";
timing() {
related_pin : "A";
cell_rise(...) { ... }
cell_fall(...) { ... }
}
}
}
Every cell has an area (abstract units, usually proportional to transistor count), a Boolean function for each output, and timing arcs for each input-to-output path. The timing arcs are lookup tables over input slew rate and output capacitance — because gate delay depends on how fast the input transitions and how much capacitance the output is driving.
For FPGAs, the situation is different. FPGA synthesis (e.g., synth_xilinx) doesn’t use Liberty for LUTs because LUTs can implement any Boolean function of N inputs — there is no “function” field to specify. Instead, the mapper asks: can this logic cone be expressed as a 6-input function? (For Xilinx 7-series: yes, if the cone has at most 6 distinct inputs.) If so, it fits in one LUT6. If not, it is split across multiple LUTs, which the mapper tries to pack according to the FPGA’s carry-chain and LUT-pair structure.
The Basys3 counter example (synth_xilinx) produced:
- 42 FDRE — Xilinx’s flip-flop primitive with synchronous reset, clock enable, and configurable polarity
- 16 LUT5 — 5-input lookup tables for the combinational decode logic
- 1 BUFG — global clock buffer (mandatory for any clock signal driving > a few fanout points)
- 16 OBUF — output pad drivers
The 42 flip-flops come from the 26-bit counter plus the 16-bit LED register. The 16 LUT5s implement the combinational LED decode logic. The numbers are exact and predictable once you understand what synth_xilinx is doing.
The AIGER Backend: Synthesis for Formal Verification
Yosys is not only for implementation synthesis. The write_aiger backend produces AIGER format — a compact binary representation of an AIG with latch (sequential element) support. This is the input format for industrial model checkers.
The AIGER flow for formal verification is:
read_verilog -formal demo.v
prep -flatten -nordff -top demo
flatten demo; delete -output
memory_map; opt -full
techmap; opt -fast
abc -fast -g AND; opt_clean
write_aiger -map demo.aim demo.aig
The -formal flag to read_verilog enables assert, assume, and cover system tasks. assume constraints narrow the reachable state space; assert specifies properties the tool must prove hold in all reachable states.
The -g AND flag to abc restricts the output cell library to AND gates plus inverters — i.e., it forces the output to be a pure AIG. write_aiger then dumps this to a .aig file that can be fed to hardware model checkers like super_prove, avy, or pono.
The -map flag produces a .aim file that tracks the correspondence between AIG nodes and original Verilog signals, allowing the model checker’s counterexample trace (a sequence of input values that reaches a failing assert) to be mapped back to named signals in the original design.
dfflibmap: Closing the Sequential Mapping Gap
After ABC maps the combinational logic, there is still a gap: the flip-flops. ABC’s technology mapper understands combinational cells from the Liberty file, but it ignores sequential cells (Liberty reports them, but ABC’s Scl_LibertyReadGenlib() explicitly skips them). The log says so directly:
ABC: Scl_LibertyReadGenlib() skipped sequential cell "DFF".
ABC: Scl_LibertyReadGenlib() skipped sequential cell "DFFSR".
The dfflibmap pass handles this separately. It reads the Liberty file, finds cells whose function matches known DFF types ($_DFF_P_, $_DFF_N_, $_DFFSR_PPP_, etc.), and replaces the generic Yosys DFF cells with the library-specific ones. For the CMOS library this means $_DFF_P_ → \DFF, using the cell’s pin names from Liberty.
The DFFLEGALIZE pass (invoked automatically by dfflibmap) handles cases where the design needs a DFF type the library doesn’t provide. If you have a $_SDFFE_PP0P_ (DFF with synchronous reset-to-0, active-high clock enable, active-high clock) but the library only has a plain $_DFF_P_, DFFLEGALIZE synthesizes the missing control logic from the available cell types.
Reading the Statistics
At the end of synthesis, stat prints the cell count and area:
=== counter ===
17 108 cells
3 54 DFF ; 3 × 18 area units each
6 24 NAND ; 6 × 4 area units each
6 24 NOR
2 6 NOT
Chip area for module '\counter': 108.000000
of which used for sequential elements: 54.000000 (50.00%)
Half the area is sequential (the three DFFs — one per bit of the 3-bit counter). The other half is combinational: the increment logic and the muxing for reset and enable. For a 3-bit counter this is sensible — the combinational logic depth is log₂(3) ≈ 1.6 gate levels, and the overhead of reset and enable adds a few more.
The area numbers are abstract — they come from the area field in the Liberty file, which was set to small integers for this toy library. In a real 65 nm standard cell library, area is reported in µm² and the numbers get interesting fast.
Why This Matters for FPGA Development
Most FPGA developers use vendor toolchains: Vivado for Xilinx, Quartus for Intel, Radiant for Lattice. These are excellent tools and you should use them for production work. But they are black boxes. When Vivado tells you that your design failed timing on a particular path, understanding what optimization it applied (or failed to apply) requires a mental model of how synthesis works.
Yosys provides that model. Because every pass is documented and the source is readable, you can step through what happens:
synth -run begin:fineruns only up to thefinecheckpoint, leaving ABC out of the pictureshowrenders the current netlist as a graphical circuit diagramwrite_rtlildumps the internal representation for inspectionselect -module counter; dumpshows exactly what cells and connections are in a module
The ability to inspect intermediate states is invaluable when debugging synthesis issues. “Why does this reset not synthesize to the FDRE R pin?” is a question Yosys can answer by letting you inspect the netlist before and after opt_dff and dfflibmap.
For devices Yosys directly supports — Lattice iCE40, ECP5, Gowin, Xilinx 7-series/Ultrascale, Anlogic, Efinix — Yosys is used in the open-source toolchain (with nextpnr for placement and routing). These flows are fully reproducible and scriptable. The entire bitstream generation pipeline is auditable down to source code, which matters for some security-sensitive applications.
Getting Started
Build from source (all dependencies are common Linux packages):
git clone --recurse-submodules https://github.com/YosysHQ/yosys.git
cd yosys
make -j$(nproc)The examples/ directory contains ready-to-run synthesis scripts. Start with examples/cmos/counter.ys — it is the simplest complete flow and the output is human-readable. Then try examples/basys3/run_yosys.ys to see what a real FPGA target looks like.
To synthesize your own Verilog for iCE40:
yosys -p "
read_verilog mydesign.v
synth_ice40 -top mytop -json mydesign.json
"
nextpnr-ice40 --hx1k --package tq144 --json mydesign.json --asc mydesign.asc
icepack mydesign.asc mydesign.bin
iceprog mydesign.binFour commands from Verilog to FPGA, all open source, all auditable.
What Yosys Cannot (Yet) Do Well
Honest assessment: Yosys is excellent at combinational and simple sequential logic. It struggles with:
- Timing-driven synthesis: Yosys does not have a built-in static timing analysis engine. ABC can minimize delay, but without an accurate timing model for the target (including routing delays), the result may not meet timing even if the gate-level delay looks fine. Vendor tools integrate STA tightly with synthesis and iterate.
- Complex memory inference: Generating block RAM (BRAM/URAM) from Verilog
regarrays requires the synthesizer to recognize the access pattern and match it to a physical BRAM primitive. Yosys handles the common cases but vendor tools are more aggressive here. - DSP inference: Multipliers, multiply-accumulate, and filter structures need to be recognized and mapped to DSP48 (Xilinx) or equivalent blocks. Yosys has passes for this but they are less mature than Vivado’s.
- SystemVerilog: The open-source
read_verilogfrontend is Verilog-2005. For SystemVerilog you need the commercial Verific frontend (available in Tabby CAD Suite) or an external elaborator likesv2v.
These are solvable engineering problems, not fundamental limitations. The gap with vendor tools has narrowed substantially in the last five years and continues to close.
The Bigger Picture
Every commercial synthesis tool — Synopsys Design Compiler, Cadence Genus, Xilinx Vivado, Intel Quartus — does approximately what Yosys does, with more engineering, more corner-case handling, and tighter integration with timing analysis. The conceptual steps are the same: elaborate the RTL into a generic netlist, optimize with technology-independent passes, map to the target library using a covering algorithm over logic cones, clean up and report.
Yosys makes this machinery visible. When you run the counter example and watch the statistics report show 17 cells collapsing from the 22 that existed before opt_dff, you are seeing something that happens inside Vivado every time you click Synthesize — except Vivado doesn’t show you. Understanding the mechanism means you can write Verilog that synthesizes predictably, recognize when a synthesis result looks wrong, and reason about area-timing tradeoffs without treating the tool as a black box.
That is the case for Yosys: not as a replacement for vendor tools, but as a lens through which all of synthesis becomes legible.
References
- Claire Wolf, Yosys Manual — the primary reference; covers RTLIL, the pass system, and the Tcl scripting interface
- Robert Brayton and Alan Mishchenko, “ABC: An Academic Industrial-Strength Verification Tool”, CAV 2010 — the paper behind ABC
- Clifford Wolf, “FPGA Synthesis with Yosys”, FOSDEM 2014 — an accessible introduction
- “Liberty Reference Manual”, Synopsys — the definitive specification for
.libfiles - Andrew Kahng et al., VLSI Physical Design: From Graph Partitioning to Timing Closure (Springer, 2011) — covers technology mapping and physical synthesis in depth