Chapter 1

Gestures

Every touch interaction starts with a gesture — a specific pattern of finger movement that the system must recognize. Tap, swipe, drag, fling, pinch: each is defined by its own combination of timing, displacement, velocity, and finger count.

·····

Touch Down

Before any gesture is recognized, there's a simpler event: the finger touches the screen. This is the starting point for every interaction. The interface should respond instantly — a scale change, a highlight, a color shift — to confirm it received the touch.

This is not the same as a tap. A tap requires the finger to lift quickly. Touch down is the press phase alone — immediate, unconditional, with no waiting for the system to decide what gesture will follow. If it turns out to be a drag, the press state transitions smoothly. If it's a tap, the press state resolves on release.

fig-099 · touch down feedback

tap anywhere

·····

Tap

The simplest gesture. Finger goes down, stays nearly still, comes back up quickly. It's the touch equivalent of a mouse click — defined by what doesn't happen. No significant movement (less than ~10px), no long hold (under ~300ms).

Tap: quick touch with minimal displacement

fig-100 · tap detector

tap anywhere

·····

Double Tap

Two taps in quick succession, near the same spot. The tricky part: the system has to decide whether the first tap is a single tap or the beginning of a double tap. The classic solution is a 300ms wait after the first tap. If a second arrives within that window, it's a double tap. If not, fire the single tap.

This delay is why interfaces that support both single and double tap often feel sluggish. The best approach: either fire the single tap immediately and reverse it if a double tap follows, or assign single and double tap to separate, unambiguous targets so they never conflict.

Double tap: two taps within 300ms and 25px

fig-101 · single vs double tap

tap anywhere

·····

Long Press

Finger goes down and stays down without moving. After a threshold (typically 400–500ms), the system fires a long press. It's the only gesture where time alone triggers recognition while the finger is still touching.

The key design principle: show a progress indicator — a ring filling around the touch point — so the user knows something will happen if they keep holding. Without visual feedback, the user is just staring at a screen wondering if anything registered. This “dead air” feeling is one of the worst UX failures in gesture design.

Long press: timer fires while finger is still down

fig-102 · long press with progress

press and hold

·····

Swipe

A quick, directional finger movement — short, fast, and decisive. Swipe is defined by both velocity (how fast your finger was moving at release) and direction (left, right, up, or down). It's distinct from a slow drag: the system checks whether the release speed exceeds a threshold (typically 600–1000 px/s) and whether the finger traveled far enough in a dominant direction.

This demo intentionally shows both rules together: swipe is recognized if release speed exceeds about 650px/s or horizontal travel exceeds 60px, so you can test velocity and displacement independently.

Swipe is the gesture that drives some of the most iconic mobile interactions — dismissing notifications, navigating between pages, revealing actions. But the recognition itself is simple: it's a drag that ends with sufficient velocity in a clear direction. The complexity lives in what happens after the swipe is detected (covered in Chapter 3).

fig-103 · swipe card

drag card and release

·····

Drag / Pan

The fundamental continuous gesture: your finger goes down, moves, and the element follows. 1:1 tracking means the element moves exactly as far as your finger, with zero lag and zero drift. This direct coupling is what makes drag feel “real” — the element appears glued to your finger.

But a drag doesn't start the instant you touch down. There's a dead zone called the touch slop — typically 10px — that your finger must cross before the system commits to a drag. This prevents accidental drags when you intended a tap or long press.

On release, the element springs back to its resting position. This provides closure — you see the gesture has ended and the system has returned to its default state.

Drag: 1:1 tracking after crossing the touch slop zone

fig-104a · 1:1 tracking + velocity transfer

drag and release quickly

The key to making 1:1 tracking feel natural: transfer the finger's velocity to the element on release. Sample the last ~100ms of movement, calculate the speed, and give that momentum to the spring. The element continues moving in the direction you were dragging, then settles into place.

Compare with a spring delay approach below. The element follows your finger with a soft lag during the drag itself, creating a sense of weight. This can feel more organic but sacrifices precision.

fig-104b · spring delay tracking

drag the circle

·····

Axis Lock

When you scroll a list, you usually mean to scroll vertically or horizontally, not both. Axis locking detects the dominant direction of your initial movement and constrains everything after that to a single axis. This prevents diagonal wobble and makes scrolling feel precise and intentional.

The system decides the axis the moment your finger crosses the touch slop threshold: whichever direction has more displacement wins, and the other axis is locked out for the rest of the gesture. This is why scrolling a list feels like it's on a rail.

Axis lock: initial movement determines X or Y constraint

fig-105 · axis lock demo

drag the circle

·····

Pinch / Zoom

The only common gesture that requires two fingers. Place two fingers on the screen and move them apart to zoom in, together to zoom out. The scale change tracks the ratio between your current finger distance and where you started — so spreading your fingers 2× apart doubles the zoom level.

The system tracks each finger independently using pointer IDs. When two pointers are active, it calculates the distance between them (the pinch span) and the midpoint (the zoom center). The zoom center matters because it determines where the zoom happens — the point between your fingers stays fixed on screen while everything else scales around it.

On release, if the scale is outside the allowed range (too small or too large), a spring pulls it back to the nearest valid value — rubber banding applied to scale instead of position. On desktop, you can use scroll wheel, right-click drag, and keyboard shortcuts (+/-/0) as fallback controls.

fig-106 · pinch to zoom

pinch with two fingerswheel · right-click drag

·····

Fling

A drag that ends while your finger is still moving fast is a fling. The velocity at the moment you lift your finger determines what happens next — if the speed exceeds a threshold, the system treats it as a fling rather than a simple release.

Fling is the bridge between gesture and physics. The recognizer decides whether a fling occurred; the physics system decides what happens after — momentum scrolling, page snap, or dismiss animation. The velocity from your fling becomes the initial speed for whatever animation follows.

Fling: drag ending with high velocity becomes a fling event

fig-107 · fling detector

drag and release quickly

·····

Gesture competition

Here's the fundamental challenge: tap, long press, and drag all start with the same action — finger down. The system can't know in advance which gesture you intend. It has to wait, observe, and decide.

This is called gesture arbitration. The moment your finger touches the screen, three recognizers start competing. The first condition to trigger wins:

Finger moves more than ~10px → it's a drag (or swipe if fast enough). Cancel the long-press timer.
Finger lifts quickly → it's a tap. Cancel the long-press timer.
400ms passes with no movement → it's a long press.

Try it below: quick tap, hold without moving, or press and drag. Watch the state badges to see which recognizer wins.

Gesture arbitration: three exits from the PRESSED state

fig-109 · recognizer arena

idle pressed tap long press drag

tap, hold, or drag

·····

Putting it together

Every gesture is built from the same raw material: pointer down, move, up. What distinguishes them is timing, displacement, velocity, and finger count:

Tap — quick down + up under 300ms, minimal movement
Double tap — two taps within ~300ms
Long press — hold without moving for ~400ms
Drag — move beyond touch slop (~10px), 1:1 tracking
Swipe — fast directional drag (>650px/s or >60px in this demo)
Fling — drag released with high velocity (>950 px/s)
Pinch / Zoom — two fingers, distance change = scale

These primitives combine to create every touch interaction you use daily — scrolling, dismissing, zooming, selecting. In the next chapter, we'll explore how physics makes these gestures feel alive: momentum, springs, bounce-back, and the math that turns cold input into warm, responsive motion.