php_clickhouse
Native PHP extension for ClickHouse, built on the official clickhouse-cpp v2.6.2 client.
Speaks the native binary TCP protocol with LZ4 / ZSTD compression and optional TLS. This guide covers how to read and write every supported data type, configure the client, stream CSV/TSV, and use the full method surface. For install and benchmarks, see the README.
PHP 7.4 – 8.5 native TCP protocol LZ4 / ZSTD TLS multi-endpoint failover
Install #
Via PIE (the PHP Foundation's PECL successor):
pie install iliaal/php_clickhouse
# with TLS:
pie install iliaal/php_clickhouse --enable-clickhouse-openssl
Then add extension=clickhouse.so to your php.ini. Build-from-source instructions, platform support, and the Docker test-server recipe are in the README.
Quick start #
<?php
$ch = new ClickHouse([
"host" => "127.0.0.1",
"port" => 9000,
"database" => "test",
"user" => "default",
"passwd" => "",
"compression" => "lz4", // or "zstd" / true / false
]);
$ch->execute("CREATE TABLE IF NOT EXISTS events (
id UInt32, ts DateTime64(3), tag LowCardinality(String)
) ENGINE = Memory");
$ch->insert("events", ["id", "ts", "tag"], [
[1, time(), "alpha"],
[2, time(), "beta"],
]);
// Filter by an in-memory list of IDs without bloating the SQL. The
// server reads `ext_ids` as a named temp table for this query only.
$hits = $ch->selectWithExternalData(
"SELECT id, tag FROM events WHERE id IN ext_ids",
[["name" => "ext_ids",
"columns" => ["id" => "UInt32"],
"rows" => [[1], [42], [1337]]]]
);
foreach ($ch->select("SELECT id, ts, tag FROM events ORDER BY id",
[], ClickHouse::DATE_AS_STRINGS) as $row) {
print_r($row);
}
Data types #
The extension maps ClickHouse column types to PHP values on both the write path (insert(), insertAssoc(), write()) and the read path (select() and the streaming readers). The matrix below is the authoritative per-type reference.
Read / write matrix #
Each row gives what insert() accepts for a column of that type, what select() hands back, whether Nullable(T) is supported, and the behavior worth knowing. Unless noted, a null into a non-Nullable column is rejected, and a value of the wrong shape throws rather than silently coercing to 0.
| ClickHouse type | Write accepts | Read returns | Nullable | Notes |
|---|---|---|---|---|
Int8 … Int64 | int, integral float, or numeric string; range-checked to the column width | PHP int | yes | Out-of-range throws. A fractional float (1.5) or partial numeric string ("12abc") is rejected. |
UInt8, UInt16 | int ≥ 0, integral float, numeric string within width | PHP int | yes | — |
UInt32 | as above, plus a "0x…" hex string | PHP int | yes | Hex must fully consume and fit in UInt32. |
UInt64 | int ≥ 0, integral float, decimal string up to UINT64_MAX, or "0x…" hex | PHP int if ≤ 263−1, otherwise a decimal string | yes | Values above the signed range come back as strings to avoid collapsing to a negative int. |
Int128 | int, or a signed decimal string in [−2127, 2127−1] | decimal string | yes | PHP ints are 64-bit; larger magnitudes need the string form. |
UInt128 | int ≥ 0, or a decimal string up to 2128−1 (≤ 39 digits) | decimal string | yes | — |
Float32, Float64 | int, float, or numeric string; NaN/Inf rejected | PHP float | yes | Float32 round-trips at single precision. |
Decimal, Decimal32/64/128(P,S) | a scalar (string / int / float) stringified and stored as a scaled decimal (e.g. "12345.6789", 42) | string with the scale point inserted ("12.34", "0.0000000000") | yes | Scale capped at 38. Decimal256 is not supported. A non-scalar value (array / object) is rejected. |
String | any (objects via __toString); binary-safe | PHP string (binary-safe) | yes | A throwing __toString propagates the exception. |
FixedString(N) | string (≤ N bytes) | PHP string with trailing NULs trimmed | yes | A value longer than N bytes is rejected. |
UUID | hyphenated or bare-hex 32-char string | 32-char bare-hex string (no hyphens) by default; hyphenated canonical form with UUID_WITH_DASHES | yes | Map keys/values use the same formatting (bare-hex by default, dashed with the flag). |
Bool | PHP truthiness via zend_is_true (pass a real bool or 0/1) | PHP bool | yes | The string "false" is truthy and stores as true. |
IPv4 | dotted-quad string, or an int / integral float matching toIPv4(N) (0 … 4294967295, MSB = first octet) | dotted-quad string | yes | Fractional float or out-of-range int throws. |
IPv6 | dotted/colon string (validated by inet_pton) | canonical string ("::1") | yes | — |
Date | int (epoch seconds) or "YYYY-MM-DD" | int epoch, or "YYYY-MM-DD" with DATE_AS_STRINGS | yes | Invalid calendar dates ("2024-02-30") and trailing garbage are rejected. |
Date32 | same as Date; supports pre-epoch (-2208988800 → "1900-01-01") | int epoch, or string with DATE_AS_STRINGS | yes | — |
DateTime | int (epoch seconds) or "YYYY-MM-DD HH:MM:SS" | int epoch, or string with DATE_AS_STRINGS | yes | A numeric string is read as a textual timestamp, not a number. |
DateTime64(N[, tz]) | int = whole epoch seconds, or "… HH:MM:SS.fff"; float = fractional epoch seconds but rejected at N ≥ 7 | raw int ticks by default, or "… HH:MM:SS.ffffff" with DATE_AS_STRINGS | yes | N capped at 9. A float can't hold nanosecond epochs exactly, so it's rejected past microseconds rather than rounded. |
Time | int seconds-since-midnight only (string rejected) | int seconds, or "HH:MM:SS" (signed) with DATE_AS_STRINGS | yes | Server needs ClickHouse 25.x or later. Values outside the int32 range are rejected. |
Time64(N) | int seconds only; float OK at N < 7, rejected at N ≥ 7; string rejected | raw int ticks, or "HH:MM:SS.fff" (sign-magnitude) with DATE_AS_STRINGS | yes | N capped at 9. |
Enum8, Enum16 | int (a declared value) or a string name (validated) | the enum name string | yes | An undeclared int or name throws. |
Array(T) | PHP list array of T cells | PHP list array | yes, as Array(Nullable(T)) | Multidimensional Array(Array(T)) is rejected on write; reads of server-nested arrays work. Array(Tuple) writes are supported. |
Tuple(...) | PHP list array; arity must match the column | PHP positional list array | inner fields may be Nullable | Arity mismatch throws. Array(Tuple) works. |
Map(K, V) | PHP assoc array per row | PHP assoc array (int or string keys) | — | Supported K/V set below. |
LowCardinality(String) and FixedString / Nullable variants | string (or null in the Nullable forms) | string (or null) | only the LC(Nullable(...)) forms | Only String / FixedString inner types; anything else throws. |
JSON | PHP array/object → JSON-encoded; string → raw JSON (validated client-side); null → "{}" | raw JSON string by default; nested assoc array with JSON_AS_ARRAY; nested stdClass with JSON_AS_OBJECT | yes, as Nullable(JSON) | Reads require output_format_native_write_json_as_string = 1. See JSON columns. |
Point | [x, y] (two numeric elements) | [x, y] float pair | — | Server needs allow_experimental_geo_types = 1. |
Ring | array of points | array of points | — | — |
Polygon | array of rings | array of rings | — | — |
MultiPolygon | array of polygons | array of polygons | — | — |
Map key/value support
Both the read decoder and the write path accept these scalar key and value types: String, Int8 … Int64, UInt8 … UInt64, Float32, Float64, UUID. The write path additionally accepts LowCardinality(String) keys and values. UInt64 entries above 263−1 come back as strings; UUID keys/values read back as bare 32-char hex by default (hyphenated with UUID_WITH_DASHES).
Map(LowCardinality(K), V) writes succeed but reads are not decoded by the vendored client. The data is queryable server-side, but a SELECT through this extension throws on the unsupported key type. Other types (Int128/UInt128, Decimal, Date*, Bool, IPv4/IPv6) are not valid Map keys or values here.
Write-side coercion #
Numeric columns (the integer, float, and decimal families) accept an int, an integral float, or a numeric string, and reject anything that doesn't fully parse as a number. The string-shaped columns are the exceptions worth memorizing:
Date,DateTime,IPv4,IPv6treat a PHP string as a textual representation (aY-m-ddate, a dotted-quad/colon address), not a numeric form. To pass the integer form, pass an actual int.TimeandTime64accept numeric seconds only; a"HH:MM:SS"string is rejected on write.DateTime64/Time64at precision ≥ 7 (sub-microsecond) reject float input, since a float can't represent nanosecond-since-epoch values exactly. Pass integer ticks or a formatted string instead.
Nested-type recursion (arrays, tuples, maps) is capped at depth 32.
Fetch modes #
fetch_mode is a bitmask combining the constants below. select, selectStream, selectStreamCallback, and selectWithExternalData all accept it. There are two categories: shape flags restructure the whole result, and value flags change how a single cell is rendered.
| Constant | Value | Category | Effect |
|---|---|---|---|
ClickHouse::FETCH_ONE | 1 | shape | Return only the first cell of the first row, as a scalar (not an array). |
ClickHouse::FETCH_KEY_PAIR | 2 | shape | Build an assoc map from column 0 → column 1 (needs ≥ 2 columns; scalar key). |
ClickHouse::FETCH_COLUMN | 8 | shape | Return a flat list of just column 0 from each row. |
ClickHouse::DATE_AS_STRINGS | 4 | value | Date / DateTime / Time families emit formatted strings instead of int epoch/ticks. |
ClickHouse::JSON_AS_ARRAY | 16 | value | JSON cells decode to a nested assoc array (wins if both JSON bits are set). |
ClickHouse::JSON_AS_OBJECT | 32 | value | JSON cells decode to a nested stdClass. |
ClickHouse::UUID_WITH_DASHES | 64 | value | UUID cells (standalone and Map keys/values) render as the hyphenated canonical form instead of bare 32-char hex. |
On the streaming readers (selectStream, selectStreamCallback), fetch_mode is the trailing argument, after settings. The shape flags (FETCH_ONE / FETCH_KEY_PAIR / FETCH_COLUMN) are ignored there; the iterator always yields full rows. Only the value flags (DATE_AS_STRINGS, JSON_AS_*, UUID_WITH_DASHES) take effect.
Configuration #
All keys go in the array passed to new ClickHouse([...]).
Connection #
| Key | Type | Default | Description |
|---|---|---|---|
host | string | 127.0.0.1 | Server host |
port | int | 9000 | Native TCP port (or 9440 for TLS) |
database | string | default | Default database |
user | string | default | Username |
passwd | string | (empty) | Password |
endpoints | array | (none) | List of [{host, port}, ...] for round-robin failover. Tried in order on connect failure. |
Compression #
| Key | Type | Default | Description |
|---|---|---|---|
compression | bool / string | false | false/"none" = uncompressed; true/"lz4" = LZ4 (fast); "zstd" = ZSTD (denser) |
max_compression_chunk_size | int | 65535 | Block size used by the compressor |
Timeouts and retry #
| Key | Type | Default | Description |
|---|---|---|---|
connect_timeout | int (sec) | 5 | TCP connect deadline |
connect_timeout_ms | int (ms) | (none) | Sub-second connect deadline; overrides the seconds key when set |
receive_timeout | int (sec) | 0 | Read deadline (0 = no timeout) |
receive_timeout_ms | int (ms) | (none) | Sub-second read deadline; overrides the seconds key when set |
send_timeout | int (sec) | 0 | Write deadline |
send_timeout_ms | int (ms) | (none) | Sub-second write deadline; overrides the seconds key when set |
retry_count | int | 1 | Send retries on transient failure |
retry_timeout | int (sec) | 5 | Sleep between retries |
tcp_nodelay | bool | true | TCP_NODELAY |
tcp_keepalive | bool | false | TCP keepalive |
tcp_keepalive_idle | int (sec) | 60 | Idle time before first keepalive probe |
tcp_keepalive_intvl | int (sec) | 5 | Interval between probes |
tcp_keepalive_cnt | int | 3 | Failed probes before declaring dead |
TLS #
Requires a build with --enable-clickhouse-openssl. Building without it and passing ssl => true raises ClickHouseException at construct time.
| Key | Type | Default | Description |
|---|---|---|---|
ssl | bool | false | Enable TLS |
ssl_min_protocol_version | string | tls1.2 | Minimum protocol; one of tls1.0, tls1.1, tls1.2, tls1.3 |
ssl_skip_verify | bool | false | Skip cert validation; dev only |
ssl_use_default_ca | bool | true | Trust the system CA bundle |
ssl_ca_files | string | array | (none) | PEM CA file path(s) |
ssl_ca_directory | string | (none) | OpenSSL hashed-cert directory |
Methods #
Every call signature is grouped below. $params, $query_id, and $settings mean the same thing everywhere: client-side / server-side placeholders, an optional query id propagated to the server, and a per-call settings override.
Constructor & DDL #
$ch = new ClickHouse(array $config);
// Run any statement that returns no result set (DDL, INSERT ... SELECT, etc.)
$ch->execute(string $sql,
array $params = [],
string $query_id = "",
array $settings = []);
$ch->ping(); // returns true on success, throws on failure
Reading #
// Buffered read into a PHP array. $fetch_mode is the bitmask from "Fetch modes".
$rows = $ch->select(string $sql,
array $params = [],
int $fetch_mode = 0,
string $query_id = "",
array $settings = []);
// Read with external in-memory tables sent alongside the query.
// Each entry: ['name' => 'ext_x', 'columns' => ['c' => 'Type', ...],
// 'rows' => [[...], [...], ...]]. The query body references
// the external table by name (e.g. `WHERE id IN ext_ids`). Keeps the
// SQL small when filtering by big lists; multiple externals per call.
$rows = $ch->selectWithExternalData(string $sql,
array $externals,
array $params = [],
int $fetch_mode = 0,
string $query_id = "",
array $settings = []);
Inserting #
// Bulk insert (entire dataset in one call), rows as positional arrays
// matching $columns.
$ch->insert(string $table, array $columns, array $values,
string $query_id = "",
array $settings = []);
// Same as insert(), but rows are associative arrays and the column list
// is derived from the first row's keys.
$ch->insertAssoc(string $table, array $rows,
string $query_id = "",
array $settings = []);
Streaming reads & writes #
For datasets larger than memory, or when you want to pipe rows to/from a file. The CSV/TSV stream helpers are documented with examples under CSV / TSV streaming.
// Streaming insert: open a block, append rows, close. write() flushes
// blocks to the server as you go.
$ch->writeStart(string $table, array $columns,
string $query_id = "",
array $settings = []);
$ch->write(array $values);
$ch->write(array $more_values);
$ch->writeEnd();
// Parse a TSV / CSV stream and insert in batches of $batch_rows. Returns
// rows inserted. See "CSV / TSV streaming".
$ch->insertFromStream(string $table, array $columns, mixed $stream,
string $format = "TabSeparated",
int $batch_rows = 10000,
string $query_id = "",
array $settings = []);
// Lazy read. selectStream() buffers raw result blocks for the iterator's
// lifetime so it stays re-iterable (Countable, rewind); it does NOT reduce
// peak memory for a huge result. For a genuinely streaming read holding one
// row at a time, use selectStreamCallback(). $fetch_mode is the trailing arg.
$iter = $ch->selectStream(string $sql, array $params = [],
string $query_id = "", array $settings = [],
int $fetch_mode = 0);
foreach ($iter as $row) { /* ... */ } // ClickHouseRowIterator: Iterator + Countable
$ch->selectStreamCallback(string $sql, callable $cb,
array $params = [], string $query_id = "",
array $settings = [],
int $fetch_mode = 0); // true per-row stream
// Write query rows straight to a PHP stream as TSV / CSV. Returns rows
// written. See "CSV / TSV streaming".
$ch->selectToStream(string $sql, array $params, mixed $stream,
string $format = "TabSeparated",
string $query_id = "", array $settings = []);
Result wrapper #
// smi2-style result wrapper: returns a ClickHouseStatement (Iterator,
// Countable, ArrayAccess, JsonSerializable) carrying a per-call stats
// snapshot. Use it for fetchOne / fetchKeyPair / fetchColumn on the result,
// or to keep stats around after running other queries. Plain $ch->select()
// is faster when you just need the array.
$stmt = $ch->selectStatement(string $sql, array $params = [],
string $query_id = "", array $settings = []);
foreach ($stmt as $row) { /* ... */ }
$stmt[0]; count($stmt); json_encode($stmt);
$stmt->fetchOne(); $stmt->fetchKeyPair(); $stmt->fetchColumn();
$stmt->toArray(); $stmt->statistics();
Settings & observability #
$ch->setSettings(array $settings); // client-wide; per-call overrides; chainable
$ch->setSetting(string $key, mixed $value); // single-key sugar, chainable
$ch->setDatabase(string $database); // USE on the server, updates default; chainable
$ch->setProgressCallback(?callable $cb);
$ch->setProfileCallback(?callable $cb);
$ch->setVerbose(true); // JSON lifecycle lines on STDERR
$ch->setVerbose(fn($e, $ctx) => ...); // or custom sink
$ch->setVerbose(false); // disable
$stats = $ch->getStatistics(); // last query: rows, bytes, elapsed_ms, query_id
Connection & server info #
$ch->resetConnection();
$info = $ch->getServerInfo(); // name, version_*, revision, timezone, display_name
$ep = $ch->getCurrentEndpoint(); // {host, port} of the active endpoint, or null
$ch->enableLogQueries(bool $enabled = true);
$log = $ch->getLogQueries(); // returns and clears the buffer (see Query log)
DDL & introspection helpers #
$ch->isExists(string $database, string $table);
$ch->showDatabases();
$ch->showProcesslist();
$ch->getServerVersion();
$ch->databaseSize(?string $database = null); // {bytes_on_disk, rows}
$ch->tablesSize(?string $database = null);
$ch->tableSize(string $table); // {rows, bytes_on_disk, partitions, modification_time}
$ch->partitions(string $table);
$ch->showTables(?string $database = null, ?string $like = null);
$ch->showCreateTable(string $table);
$ch->getServerUptime(); // seconds
$ch->truncateTable(string $table);
$ch->dropPartition(string $table, string $partition);
JSON columns #
The JSON column type round-trips through PHP.
Reads require output_format_native_write_json_as_string = 1 on the session (set it via setSettings() or a per-query settings array). The vendored client only understands the string serialization; a read without this setting fails with a protocol error.
$ch->setSettings([
"allow_experimental_json_type" => 1, // only needed to CREATE a JSON column
"output_format_native_write_json_as_string" => 1, // required to SELECT JSON
]);
// Insert: a PHP array or object is JSON-encoded; a string is stored as
// raw JSON text (validated client-side); null becomes the empty object {}.
$ch->insert("events", ["id", "payload"], [
[1, ["level" => "info", "tags" => ["a", "b"]]],
[2, '{"level":"warn"}'],
]);
// Read: raw JSON string by default.
$ch->select("SELECT payload FROM events");
// JSON_AS_ARRAY decodes to a nested associative array;
// JSON_AS_OBJECT decodes to nested stdClass.
$ch->select("SELECT payload FROM events", [], ClickHouse::JSON_AS_ARRAY);
$ch->select("SELECT payload FROM events", [], ClickHouse::JSON_AS_OBJECT);
CSV / TSV streaming #
Two methods move rows directly between a ClickHouse table and a PHP stream resource, formatting or parsing the bytes in C++ so neither path builds a full PHP result array. Both accept these formats: TabSeparated (alias TSV), TabSeparatedWithNames (alias TSVWithNames), CSV, and CSVWithNames.
Reading: selectToStream()
Writes query rows to a stream as TSV/CSV, block-by-block, and returns the number of rows written. Dates and datetimes emit ISO strings; Decimal / Int128 / UInt128 / UUID / IPv4 / IPv6 emit their string forms. NULL becomes \N in TSV and an unquoted \N in CSV (a real string equal to "\N" is quoted so it can't be misread as NULL).
$f = fopen("/tmp/events.tsv", "wb");
$n = $ch->selectToStream(
"SELECT id, ts, tag FROM events ORDER BY id",
[], $f, "TabSeparatedWithNames"
);
fclose($f);
echo "$n rows written\n";
Composite and geo columns are rejected: Array, Tuple, Map, Point, Ring, Polygon, and MultiPolygon can't be unambiguously serialized to a flat text format, so selectToStream() throws naming the offending type. Use select() for those. (Nullable and LowCardinality wrappers around a streamable scalar are fine.)
Writing: insertFromStream()
Parses a TSV/CSV stream and inserts rows in batches of $batch_rows (default 10000), returning the number of rows inserted. Only one batch of per-column zvals exists at a time, so inputs larger than memory work. Cells arrive as strings and run through the same write coercion as insert() for each server-supplied column type, so a numeric column still needs a valid numeric string.
$f = fopen("/tmp/events.csv", "rb");
$n = $ch->insertFromStream(
"events", ["id", "ts", "tag"], $f,
"CSVWithNames", 5000
);
fclose($f);
echo "$n rows inserted\n";
- NULL marker: TSV uses the
\Nescape (a literal\\Nround-trips as the two-byte string); CSV uses an unquoted literal\Ncell. An empty CSV cell becomes an empty string, not NULL. - A
\Ncell aimed at a non-Nullablecolumn throws; reaching NULL needs aNullable(T)destination. - The
*WithNamesformats skip the first row as a header. batch_rows < 1, a non-stream argument, or an empty column list all throw.
Placeholders #
Two placeholder syntaxes are supported in select / execute:
{name}is client-side identifier substitution. Two value shapes:- Scalar (string / int / float): coerces to a single token, validated as either an identifier (
[A-Za-z_][A-Za-z0-9_]*, optionally db-qualified by one dot likedb.tbl) or a numeric literal. Whitespace, commas, quotes, semicolons, backslashes, and other SQL meta-characters are rejected. - Array: each element is validated as a single scalar token, then joined with
", "for the SQL replacement. Use this for legitimate column lists; an element with internal whitespace or commas is still rejected. A scalar value containing commas is rejected; that ambiguity (single identifier vs. list) is the point of the array-shape API.
- Scalar (string / int / float): coerces to a single token, validated as either an identifier (
{name:Type}is a server-side typed parameter. The SQL text is passed through unchanged; the value is bound viaQuery::SetParamand the server quotes and parses it according toType. Pass PHP arrays forArray(T)types;nullbecomes a serverNULL.
// Single-identifier substitution.
$ch->select("SELECT * FROM {tbl}", ["tbl" => "users"]);
// Column-list substitution via array value.
$ch->select("SELECT {cols} FROM users",
["cols" => ["id", "name", "email"]]);
// Server-side typed parameters, no client-side quoting needed.
$ch->select("SELECT * FROM users WHERE id IN ({ids:Array(UInt32)})",
["ids" => [1, 2, 3]]);
Settings #
setSettings() applies client-wide. The 5th argument on select / insert / execute / writeStart overrides per call. Both accept plain string => string pairs; PHP scalars are stringified for you.
$ch->setSettings(["max_execution_time" => "30"]);
// Per-call override.
$ch->select("SELECT * FROM big_table",
[], 0, "",
["max_execution_time" => "5",
"max_memory_usage" => "1000000000"]);
Statistics and progress #
$ch->setProgressCallback(function (array $p) {
fprintf(STDERR, "rows=%d bytes=%d\n", $p["rows"], $p["bytes"]);
});
$ch->select("SELECT count() FROM big_table");
$stats = $ch->getStatistics();
// rows_read, bytes_read, total_rows, written_rows, written_bytes,
// blocks, rows_before_limit, applied_limit, elapsed_ms
A progress or profile callback that throws unwinds the client's packet loop mid-stream. Call resetConnection() before reusing the client after a callback throws.
Query log #
enableLogQueries(true) turns on a per-client buffer that records each completed select / insert / execute / streaming writeEnd. Each entry is {sql, query_id, elapsed_ms, rows_read, bytes_read, error_code, error_message}. sql is capped before retention. error_code is 0 on success, the server error code on a ServerException, or -1 on client/network failure. getLogQueries() returns the buffer and clears it.
$ch->enableLogQueries(true);
$ch->select("SELECT count() FROM users");
$ch->insert("logins", ["user_id", "ts"], $batch);
foreach ($ch->getLogQueries() as $q) {
fprintf(STDERR, "[%.1fms] %s\n", $q["elapsed_ms"], $q["sql"]);
}
Structured exceptions #
ClickHouseException carries three extra public properties:
server_code: ClickHouse error code (e.g. 159 =TIMEOUT_EXCEEDED).0for client-side errors.server_name: server-reported exception name (e.g.DB::Exception).nullfor client-side errors.query_id: the query id associated with the failed call, when one was supplied.nullotherwise.