php_clickhousedocs

php_clickhouse

Native PHP extension for ClickHouse, built on the official clickhouse-cpp v2.6.2 client.

Speaks the native binary TCP protocol with LZ4 / ZSTD compression and optional TLS. This guide covers how to read and write every supported data type, configure the client, stream CSV/TSV, and use the full method surface. For install and benchmarks, see the README.

PHP 7.4 – 8.5 native TCP protocol LZ4 / ZSTD TLS multi-endpoint failover

Install #

Via PIE (the PHP Foundation's PECL successor):

pie install iliaal/php_clickhouse
# with TLS:
pie install iliaal/php_clickhouse --enable-clickhouse-openssl

Then add extension=clickhouse.so to your php.ini. Build-from-source instructions, platform support, and the Docker test-server recipe are in the README.

Quick start #

<?php
$ch = new ClickHouse([
    "host"        => "127.0.0.1",
    "port"        => 9000,
    "database"    => "test",
    "user"        => "default",
    "passwd"      => "",
    "compression" => "lz4",   // or "zstd" / true / false
]);

$ch->execute("CREATE TABLE IF NOT EXISTS events (
    id UInt32, ts DateTime64(3), tag LowCardinality(String)
) ENGINE = Memory");

$ch->insert("events", ["id", "ts", "tag"], [
    [1, time(), "alpha"],
    [2, time(), "beta"],
]);

// Filter by an in-memory list of IDs without bloating the SQL. The
// server reads `ext_ids` as a named temp table for this query only.
$hits = $ch->selectWithExternalData(
    "SELECT id, tag FROM events WHERE id IN ext_ids",
    [["name" => "ext_ids",
      "columns" => ["id" => "UInt32"],
      "rows" => [[1], [42], [1337]]]]
);

foreach ($ch->select("SELECT id, ts, tag FROM events ORDER BY id",
                     [], ClickHouse::DATE_AS_STRINGS) as $row) {
    print_r($row);
}

Data types #

The extension maps ClickHouse column types to PHP values on both the write path (insert(), insertAssoc(), write()) and the read path (select() and the streaming readers). The matrix below is the authoritative per-type reference.

Read / write matrix #

Each row gives what insert() accepts for a column of that type, what select() hands back, whether Nullable(T) is supported, and the behavior worth knowing. Unless noted, a null into a non-Nullable column is rejected, and a value of the wrong shape throws rather than silently coercing to 0.

ClickHouse typeWrite acceptsRead returnsNullableNotes
Int8Int64int, integral float, or numeric string; range-checked to the column widthPHP intyesOut-of-range throws. A fractional float (1.5) or partial numeric string ("12abc") is rejected.
UInt8, UInt16int ≥ 0, integral float, numeric string within widthPHP intyes
UInt32as above, plus a "0x…" hex stringPHP intyesHex must fully consume and fit in UInt32.
UInt64int ≥ 0, integral float, decimal string up to UINT64_MAX, or "0x…" hexPHP int if ≤ 263−1, otherwise a decimal stringyesValues above the signed range come back as strings to avoid collapsing to a negative int.
Int128int, or a signed decimal string in [−2127, 2127−1]decimal stringyesPHP ints are 64-bit; larger magnitudes need the string form.
UInt128int ≥ 0, or a decimal string up to 2128−1 (≤ 39 digits)decimal stringyes
Float32, Float64int, float, or numeric string; NaN/Inf rejectedPHP floatyesFloat32 round-trips at single precision.
Decimal, Decimal32/64/128(P,S)a scalar (string / int / float) stringified and stored as a scaled decimal (e.g. "12345.6789", 42)string with the scale point inserted ("12.34", "0.0000000000")yesScale capped at 38. Decimal256 is not supported. A non-scalar value (array / object) is rejected.
Stringany (objects via __toString); binary-safePHP string (binary-safe)yesA throwing __toString propagates the exception.
FixedString(N)string (≤ N bytes)PHP string with trailing NULs trimmedyesA value longer than N bytes is rejected.
UUIDhyphenated or bare-hex 32-char string32-char bare-hex string (no hyphens) by default; hyphenated canonical form with UUID_WITH_DASHESyesMap keys/values use the same formatting (bare-hex by default, dashed with the flag).
BoolPHP truthiness via zend_is_true (pass a real bool or 0/1)PHP boolyesThe string "false" is truthy and stores as true.
IPv4dotted-quad string, or an int / integral float matching toIPv4(N) (0 … 4294967295, MSB = first octet)dotted-quad stringyesFractional float or out-of-range int throws.
IPv6dotted/colon string (validated by inet_pton)canonical string ("::1")yes
Dateint (epoch seconds) or "YYYY-MM-DD"int epoch, or "YYYY-MM-DD" with DATE_AS_STRINGSyesInvalid calendar dates ("2024-02-30") and trailing garbage are rejected.
Date32same as Date; supports pre-epoch (-2208988800"1900-01-01")int epoch, or string with DATE_AS_STRINGSyes
DateTimeint (epoch seconds) or "YYYY-MM-DD HH:MM:SS"int epoch, or string with DATE_AS_STRINGSyesA numeric string is read as a textual timestamp, not a number.
DateTime64(N[, tz])int = whole epoch seconds, or "… HH:MM:SS.fff"; float = fractional epoch seconds but rejected at N ≥ 7raw int ticks by default, or "… HH:MM:SS.ffffff" with DATE_AS_STRINGSyesN capped at 9. A float can't hold nanosecond epochs exactly, so it's rejected past microseconds rather than rounded.
Timeint seconds-since-midnight only (string rejected)int seconds, or "HH:MM:SS" (signed) with DATE_AS_STRINGSyesServer needs ClickHouse 25.x or later. Values outside the int32 range are rejected.
Time64(N)int seconds only; float OK at N < 7, rejected at N ≥ 7; string rejectedraw int ticks, or "HH:MM:SS.fff" (sign-magnitude) with DATE_AS_STRINGSyesN capped at 9.
Enum8, Enum16int (a declared value) or a string name (validated)the enum name stringyesAn undeclared int or name throws.
Array(T)PHP list array of T cellsPHP list arrayyes, as Array(Nullable(T))Multidimensional Array(Array(T)) is rejected on write; reads of server-nested arrays work. Array(Tuple) writes are supported.
Tuple(...)PHP list array; arity must match the columnPHP positional list arrayinner fields may be NullableArity mismatch throws. Array(Tuple) works.
Map(K, V)PHP assoc array per rowPHP assoc array (int or string keys)Supported K/V set below.
LowCardinality(String) and FixedString / Nullable variantsstring (or null in the Nullable forms)string (or null)only the LC(Nullable(...)) formsOnly String / FixedString inner types; anything else throws.
JSONPHP array/object → JSON-encoded; string → raw JSON (validated client-side); null"{}"raw JSON string by default; nested assoc array with JSON_AS_ARRAY; nested stdClass with JSON_AS_OBJECTyes, as Nullable(JSON)Reads require output_format_native_write_json_as_string = 1. See JSON columns.
Point[x, y] (two numeric elements)[x, y] float pairServer needs allow_experimental_geo_types = 1.
Ringarray of pointsarray of points
Polygonarray of ringsarray of rings
MultiPolygonarray of polygonsarray of polygons

Map key/value support

Both the read decoder and the write path accept these scalar key and value types: String, Int8Int64, UInt8UInt64, Float32, Float64, UUID. The write path additionally accepts LowCardinality(String) keys and values. UInt64 entries above 263−1 come back as strings; UUID keys/values read back as bare 32-char hex by default (hyphenated with UUID_WITH_DASHES).

Map(LowCardinality(K), V) writes succeed but reads are not decoded by the vendored client. The data is queryable server-side, but a SELECT through this extension throws on the unsupported key type. Other types (Int128/UInt128, Decimal, Date*, Bool, IPv4/IPv6) are not valid Map keys or values here.

Write-side coercion #

Numeric columns (the integer, float, and decimal families) accept an int, an integral float, or a numeric string, and reject anything that doesn't fully parse as a number. The string-shaped columns are the exceptions worth memorizing:

Nested-type recursion (arrays, tuples, maps) is capped at depth 32.

Fetch modes #

fetch_mode is a bitmask combining the constants below. select, selectStream, selectStreamCallback, and selectWithExternalData all accept it. There are two categories: shape flags restructure the whole result, and value flags change how a single cell is rendered.

ConstantValueCategoryEffect
ClickHouse::FETCH_ONE1shapeReturn only the first cell of the first row, as a scalar (not an array).
ClickHouse::FETCH_KEY_PAIR2shapeBuild an assoc map from column 0 → column 1 (needs ≥ 2 columns; scalar key).
ClickHouse::FETCH_COLUMN8shapeReturn a flat list of just column 0 from each row.
ClickHouse::DATE_AS_STRINGS4valueDate / DateTime / Time families emit formatted strings instead of int epoch/ticks.
ClickHouse::JSON_AS_ARRAY16valueJSON cells decode to a nested assoc array (wins if both JSON bits are set).
ClickHouse::JSON_AS_OBJECT32valueJSON cells decode to a nested stdClass.
ClickHouse::UUID_WITH_DASHES64valueUUID cells (standalone and Map keys/values) render as the hyphenated canonical form instead of bare 32-char hex.

On the streaming readers (selectStream, selectStreamCallback), fetch_mode is the trailing argument, after settings. The shape flags (FETCH_ONE / FETCH_KEY_PAIR / FETCH_COLUMN) are ignored there; the iterator always yields full rows. Only the value flags (DATE_AS_STRINGS, JSON_AS_*, UUID_WITH_DASHES) take effect.

Configuration #

All keys go in the array passed to new ClickHouse([...]).

Connection #

KeyTypeDefaultDescription
hoststring127.0.0.1Server host
portint9000Native TCP port (or 9440 for TLS)
databasestringdefaultDefault database
userstringdefaultUsername
passwdstring(empty)Password
endpointsarray(none)List of [{host, port}, ...] for round-robin failover. Tried in order on connect failure.

Compression #

KeyTypeDefaultDescription
compressionbool / stringfalsefalse/"none" = uncompressed; true/"lz4" = LZ4 (fast); "zstd" = ZSTD (denser)
max_compression_chunk_sizeint65535Block size used by the compressor

Timeouts and retry #

KeyTypeDefaultDescription
connect_timeoutint (sec)5TCP connect deadline
connect_timeout_msint (ms)(none)Sub-second connect deadline; overrides the seconds key when set
receive_timeoutint (sec)0Read deadline (0 = no timeout)
receive_timeout_msint (ms)(none)Sub-second read deadline; overrides the seconds key when set
send_timeoutint (sec)0Write deadline
send_timeout_msint (ms)(none)Sub-second write deadline; overrides the seconds key when set
retry_countint1Send retries on transient failure
retry_timeoutint (sec)5Sleep between retries
tcp_nodelaybooltrueTCP_NODELAY
tcp_keepaliveboolfalseTCP keepalive
tcp_keepalive_idleint (sec)60Idle time before first keepalive probe
tcp_keepalive_intvlint (sec)5Interval between probes
tcp_keepalive_cntint3Failed probes before declaring dead

TLS #

Requires a build with --enable-clickhouse-openssl. Building without it and passing ssl => true raises ClickHouseException at construct time.

KeyTypeDefaultDescription
sslboolfalseEnable TLS
ssl_min_protocol_versionstringtls1.2Minimum protocol; one of tls1.0, tls1.1, tls1.2, tls1.3
ssl_skip_verifyboolfalseSkip cert validation; dev only
ssl_use_default_cabooltrueTrust the system CA bundle
ssl_ca_filesstring | array(none)PEM CA file path(s)
ssl_ca_directorystring(none)OpenSSL hashed-cert directory

Methods #

Every call signature is grouped below. $params, $query_id, and $settings mean the same thing everywhere: client-side / server-side placeholders, an optional query id propagated to the server, and a per-call settings override.

Constructor & DDL #

$ch = new ClickHouse(array $config);

// Run any statement that returns no result set (DDL, INSERT ... SELECT, etc.)
$ch->execute(string $sql,
             array $params = [],
             string $query_id = "",
             array $settings = []);

$ch->ping();   // returns true on success, throws on failure

Reading #

// Buffered read into a PHP array. $fetch_mode is the bitmask from "Fetch modes".
$rows = $ch->select(string $sql,
                    array $params = [],
                    int $fetch_mode = 0,
                    string $query_id = "",
                    array $settings = []);

// Read with external in-memory tables sent alongside the query.
// Each entry: ['name' => 'ext_x', 'columns' => ['c' => 'Type', ...],
//              'rows' => [[...], [...], ...]]. The query body references
// the external table by name (e.g. `WHERE id IN ext_ids`). Keeps the
// SQL small when filtering by big lists; multiple externals per call.
$rows = $ch->selectWithExternalData(string $sql,
                                    array  $externals,
                                    array  $params = [],
                                    int    $fetch_mode = 0,
                                    string $query_id = "",
                                    array  $settings = []);

Inserting #

// Bulk insert (entire dataset in one call), rows as positional arrays
// matching $columns.
$ch->insert(string $table, array $columns, array $values,
            string $query_id = "",
            array $settings = []);

// Same as insert(), but rows are associative arrays and the column list
// is derived from the first row's keys.
$ch->insertAssoc(string $table, array $rows,
                 string $query_id = "",
                 array $settings = []);

Streaming reads & writes #

For datasets larger than memory, or when you want to pipe rows to/from a file. The CSV/TSV stream helpers are documented with examples under CSV / TSV streaming.

// Streaming insert: open a block, append rows, close. write() flushes
// blocks to the server as you go.
$ch->writeStart(string $table, array $columns,
                string $query_id = "",
                array $settings = []);
$ch->write(array $values);
$ch->write(array $more_values);
$ch->writeEnd();

// Parse a TSV / CSV stream and insert in batches of $batch_rows. Returns
// rows inserted. See "CSV / TSV streaming".
$ch->insertFromStream(string $table, array $columns, mixed $stream,
                      string $format = "TabSeparated",
                      int    $batch_rows = 10000,
                      string $query_id = "",
                      array  $settings = []);

// Lazy read. selectStream() buffers raw result blocks for the iterator's
// lifetime so it stays re-iterable (Countable, rewind); it does NOT reduce
// peak memory for a huge result. For a genuinely streaming read holding one
// row at a time, use selectStreamCallback(). $fetch_mode is the trailing arg.
$iter = $ch->selectStream(string $sql, array $params = [],
                          string $query_id = "", array $settings = [],
                          int $fetch_mode = 0);
foreach ($iter as $row) { /* ... */ }   // ClickHouseRowIterator: Iterator + Countable

$ch->selectStreamCallback(string $sql, callable $cb,
                          array $params = [], string $query_id = "",
                          array $settings = [],
                          int $fetch_mode = 0);  // true per-row stream

// Write query rows straight to a PHP stream as TSV / CSV. Returns rows
// written. See "CSV / TSV streaming".
$ch->selectToStream(string $sql, array $params, mixed $stream,
                    string $format = "TabSeparated",
                    string $query_id = "", array $settings = []);

Result wrapper #

// smi2-style result wrapper: returns a ClickHouseStatement (Iterator,
// Countable, ArrayAccess, JsonSerializable) carrying a per-call stats
// snapshot. Use it for fetchOne / fetchKeyPair / fetchColumn on the result,
// or to keep stats around after running other queries. Plain $ch->select()
// is faster when you just need the array.
$stmt = $ch->selectStatement(string $sql, array $params = [],
                             string $query_id = "", array $settings = []);
foreach ($stmt as $row) { /* ... */ }
$stmt[0]; count($stmt); json_encode($stmt);
$stmt->fetchOne(); $stmt->fetchKeyPair(); $stmt->fetchColumn();
$stmt->toArray(); $stmt->statistics();

Settings & observability #

$ch->setSettings(array $settings);            // client-wide; per-call overrides; chainable
$ch->setSetting(string $key, mixed $value);  // single-key sugar, chainable
$ch->setDatabase(string $database);          // USE on the server, updates default; chainable
$ch->setProgressCallback(?callable $cb);
$ch->setProfileCallback(?callable $cb);
$ch->setVerbose(true);                 // JSON lifecycle lines on STDERR
$ch->setVerbose(fn($e, $ctx) => ...);  // or custom sink
$ch->setVerbose(false);                // disable
$stats = $ch->getStatistics();         // last query: rows, bytes, elapsed_ms, query_id

Connection & server info #

$ch->resetConnection();
$info = $ch->getServerInfo();          // name, version_*, revision, timezone, display_name
$ep   = $ch->getCurrentEndpoint();     // {host, port} of the active endpoint, or null
$ch->enableLogQueries(bool $enabled = true);
$log = $ch->getLogQueries();           // returns and clears the buffer (see Query log)

DDL & introspection helpers #

$ch->isExists(string $database, string $table);
$ch->showDatabases();
$ch->showProcesslist();
$ch->getServerVersion();
$ch->databaseSize(?string $database = null);     // {bytes_on_disk, rows}
$ch->tablesSize(?string $database = null);
$ch->tableSize(string $table);                   // {rows, bytes_on_disk, partitions, modification_time}
$ch->partitions(string $table);
$ch->showTables(?string $database = null, ?string $like = null);
$ch->showCreateTable(string $table);
$ch->getServerUptime();                // seconds
$ch->truncateTable(string $table);
$ch->dropPartition(string $table, string $partition);

JSON columns #

The JSON column type round-trips through PHP.

Reads require output_format_native_write_json_as_string = 1 on the session (set it via setSettings() or a per-query settings array). The vendored client only understands the string serialization; a read without this setting fails with a protocol error.

$ch->setSettings([
    "allow_experimental_json_type"              => 1, // only needed to CREATE a JSON column
    "output_format_native_write_json_as_string" => 1, // required to SELECT JSON
]);

// Insert: a PHP array or object is JSON-encoded; a string is stored as
// raw JSON text (validated client-side); null becomes the empty object {}.
$ch->insert("events", ["id", "payload"], [
    [1, ["level" => "info", "tags" => ["a", "b"]]],
    [2, '{"level":"warn"}'],
]);

// Read: raw JSON string by default.
$ch->select("SELECT payload FROM events");

// JSON_AS_ARRAY decodes to a nested associative array;
// JSON_AS_OBJECT decodes to nested stdClass.
$ch->select("SELECT payload FROM events", [], ClickHouse::JSON_AS_ARRAY);
$ch->select("SELECT payload FROM events", [], ClickHouse::JSON_AS_OBJECT);

CSV / TSV streaming #

Two methods move rows directly between a ClickHouse table and a PHP stream resource, formatting or parsing the bytes in C++ so neither path builds a full PHP result array. Both accept these formats: TabSeparated (alias TSV), TabSeparatedWithNames (alias TSVWithNames), CSV, and CSVWithNames.

Reading: selectToStream()

Writes query rows to a stream as TSV/CSV, block-by-block, and returns the number of rows written. Dates and datetimes emit ISO strings; Decimal / Int128 / UInt128 / UUID / IPv4 / IPv6 emit their string forms. NULL becomes \N in TSV and an unquoted \N in CSV (a real string equal to "\N" is quoted so it can't be misread as NULL).

$f = fopen("/tmp/events.tsv", "wb");
$n = $ch->selectToStream(
    "SELECT id, ts, tag FROM events ORDER BY id",
    [], $f, "TabSeparatedWithNames"
);
fclose($f);
echo "$n rows written\n";

Composite and geo columns are rejected: Array, Tuple, Map, Point, Ring, Polygon, and MultiPolygon can't be unambiguously serialized to a flat text format, so selectToStream() throws naming the offending type. Use select() for those. (Nullable and LowCardinality wrappers around a streamable scalar are fine.)

Writing: insertFromStream()

Parses a TSV/CSV stream and inserts rows in batches of $batch_rows (default 10000), returning the number of rows inserted. Only one batch of per-column zvals exists at a time, so inputs larger than memory work. Cells arrive as strings and run through the same write coercion as insert() for each server-supplied column type, so a numeric column still needs a valid numeric string.

$f = fopen("/tmp/events.csv", "rb");
$n = $ch->insertFromStream(
    "events", ["id", "ts", "tag"], $f,
    "CSVWithNames", 5000
);
fclose($f);
echo "$n rows inserted\n";

Placeholders #

Two placeholder syntaxes are supported in select / execute:

// Single-identifier substitution.
$ch->select("SELECT * FROM {tbl}", ["tbl" => "users"]);

// Column-list substitution via array value.
$ch->select("SELECT {cols} FROM users",
            ["cols" => ["id", "name", "email"]]);

// Server-side typed parameters, no client-side quoting needed.
$ch->select("SELECT * FROM users WHERE id IN ({ids:Array(UInt32)})",
            ["ids" => [1, 2, 3]]);

Settings #

setSettings() applies client-wide. The 5th argument on select / insert / execute / writeStart overrides per call. Both accept plain string => string pairs; PHP scalars are stringified for you.

$ch->setSettings(["max_execution_time" => "30"]);

// Per-call override.
$ch->select("SELECT * FROM big_table",
            [], 0, "",
            ["max_execution_time" => "5",
             "max_memory_usage"   => "1000000000"]);

Statistics and progress #

$ch->setProgressCallback(function (array $p) {
    fprintf(STDERR, "rows=%d bytes=%d\n", $p["rows"], $p["bytes"]);
});

$ch->select("SELECT count() FROM big_table");

$stats = $ch->getStatistics();
// rows_read, bytes_read, total_rows, written_rows, written_bytes,
// blocks, rows_before_limit, applied_limit, elapsed_ms

A progress or profile callback that throws unwinds the client's packet loop mid-stream. Call resetConnection() before reusing the client after a callback throws.

Query log #

enableLogQueries(true) turns on a per-client buffer that records each completed select / insert / execute / streaming writeEnd. Each entry is {sql, query_id, elapsed_ms, rows_read, bytes_read, error_code, error_message}. sql is capped before retention. error_code is 0 on success, the server error code on a ServerException, or -1 on client/network failure. getLogQueries() returns the buffer and clears it.

$ch->enableLogQueries(true);
$ch->select("SELECT count() FROM users");
$ch->insert("logins", ["user_id", "ts"], $batch);

foreach ($ch->getLogQueries() as $q) {
    fprintf(STDERR, "[%.1fms] %s\n", $q["elapsed_ms"], $q["sql"]);
}

Structured exceptions #

ClickHouseException carries three extra public properties: