政府統計を機械判読可能にするために提言した

政府統計を機械判読可能にするための意見を募集していたので提言した。といっても別にたいそうなことは何もなく,知ってる人であれば誰でも考えることに過ぎないので,同様の趣旨の提言をしている人は他にもたくさんいるだろう。たまたま,最近RDFについて調べていたので,ま,ついでに。

以下に応じた。

内容は以下。横線は送信した後気づいた基本的な誤り。


[自己紹介は削除]「日ごろ、e-Statをご利用いただいている利用者」には当たりませんし,決して統計データ処理の専門家でもありませんが,意見を述べさせてください。

統計データでは,例えばあるデータセット内の「東京」と,他のデータセット内の「東京」が同じ東京都を指しているかは文脈によります。場合によっては,「東京」という名前の料理屋を指すのかも知れません。

こういった曖昧さを省き厳密に定義しようとする試みの一つがResource Description Framework (RDF)です。知識表現のために考え出された枠組みです。
https://en.wikipedia.org/wiki/Resource_Description_Framework

一般には知られていませんが,例えばWikipediaに蓄えられている事実を可能な限りRDFで表現したデータベースがあり,Googleで例えば「東京」をキーワードに検索したときに,右枠に表示されるサマリーはこのデータベースを利用して生成されているそうです。

多種類のデータセットがある際,RDFで各項目が厳密に定義されていれば,それらを組み合わせた複合的解析が楽になり促進されます。厳密な定義がなければ,必ず人がデータの意味を解釈し,例えば別データセット内の「東京」が実際同じものを指しているか検証する,などのステップが必要になりますがこれが必要なくなります。

RDFによる記述は厳密である分極めて煩雑で,敷居が高いことは否めません。しかし,政府で生成する統計表は各省庁・部局を跨いで被るところも項目も多いであろうことと,また,同じ部署であれば同種の統計を継続的に作成するであろうことを考えると,最初に包括的にこれの道具立てを済ませてしまうことは有意義な投資になると思われます。特に,同じ統計を毎年作成するような場合は,一旦雛形ができてしまえば,後は数字を入れ替えるだけになりますから,事務職員の大きな負担にもならないと思われます。

以上よろしくご検討ください。

Functional Reactive Programming

Functional Reactive Programming, or FRP is very hard for me to understand, as if functional programming itself was not hard enough. This concept is supposed to have been popularized by Conal Elliott. I tried to read what is supposed to be his seminal paper (“Genuinely Functional User Interfaces“), but it went straight over my head. He also gives an answer to a question on StackOverflow, “terminology – What is (functional) reactive programming? – Stack Overflow,” which is a bit approachable.

Elm, a functional programming language for Web apps (Wikipedia), parted ways with FRP.

Cellx, which bills itself as “the ultra-fast implementation of reactivity for Javascript” is a whole lot easier to understand. “Building a Reactive App using Cellx and React | 60devs” But I can see it is reactive, but I don’t think it is functional.

Speaking of FRP and Javascript, RxJS – Introduction Cycle.js

Gun, “the database for freedom fighters” (often stylized “GUN” in official documents, but I will write “Gun” instead because I have not seen any proof or even indication that the word is an acronym) claims to support FRP because it supports “a code structure that emphasizes vertical readability by avoiding nested loops and callbacks.” Does this meet the requirement of FRP? It seems to mean that it only supports operator (function) chaining and I do not think that alone means it supports FRP.

CouchDB-Related Products

IBM Cloudant, a CouchDB hosting service, possibly one of the very few remaining of the kind.

Barrel, formally known as RCouch, seems to add P2P and local data to CouchDB, but I cannot find any comprehensive documentation. I am not sure if it has been updated recently.

PouchDB, the JavaScript Database that Syncs! reimplements a lot of CouchDB functionality in JavaScript and syncs to CouchDB-compatible servers.

RxDB, a “realtime database for JavaScript applications” can use a PouchDB backend and sync to CouchDB-compatible servers. Actually, my impression is that this was made on top of PouchDB to add schema capabilities based on JSON Schema. The claim that it is “realtime” is a stretch; what it means is that queries are not one-off and can continue to emit events as data changes, which is conducive to reactive programming for the front-end. This reactivity comes from RxJS.

What is particularly of note about RxDB is that it has the leader-election mechanism. When you have multiple tabs on the same browser all of which communicate with a same remote CouchDB instance, their communication can be redundant if they are naïvely implemented. The leader-election mechanism can have just one RxDB communicate and remove redundancy. I do not think PouchDB has an equivalent facility. See also “Local-first database: RxDB + PouchDB | Jared Forsyth.com.”

Hoodie is a hosted frontend development service which provides the bakend via combination of CouchDB and PouchDB.

タッチパッドつき携帯用Bluetoothキーボード

最終的には入手しないとは思うのだが調べてしまったので。

タッチパッドだけでなくそれ用のボタンもあるものというと,iCleverの折りたたみキーボードIC-BK08し(マニュアル; そのスナップショット)か見当たらない。定価で5,500円ほど。クーポンを利用しても4,500円ほどで決して安くない。違う製品名で売られてるのもあるが安いどころかむしろ高い。これ相当の製品はAliExpressでもeBayでも見当たらない。ちょっと気になるのは以下:

回答: マニュアルによると、AndroidとWindowsに関しては2本指タップすると右クリック機能が本来の仕様です。iOS/iPad OSに関しては文字入力が快適なので私は気にしていません。
投稿者: 1968@51、投稿日: 2020/04/28

似たような製品でタッチパッド用ボタンを諦めるのなら3,000円ほどこれと同じものは “B033” という型番でAliExpressで多数売られているので,そこから安いのを選べば2,500円ほど。国内販売でも頑張って探せばこれよりさほど高くないお値段で入手できよう。

ボタンもあるタッチパッド付属のものがいいと考えたのは,かつてWindows 10が走るタブレットとBluetoothキーボードを併用した経験から。タッチパッドがないと非常に不便で,あとでマウスを追加することになった。今想定しているのは外出先で,スマホにぱっと少~中量の文字入力をするというシナリオなので,実はボタンはさほど重要ではないかもしれない。

携帯のため折り畳めることを諦めるものの,タッチパッド用ボタンを諦めないから,サンワダイレクトのBluetoothキーボード(タッチパッド・コンパクト・充電式・iPhone・iPad・アイソレーション・パンタグラフ・マルチペアリング・英字配列) 400-SKB066定価通りの4,980円で売られている。折り畳めないと持ち運び時にキー面を保護する工夫が必要になるだろう。というか長さが40cmほどあるのでやはり持ち運びには向いてないか。以下にレビュー:

Logicool Wireless Touch Keyboard K400 Plusも同様の商品かと思われたが,BluetoothではなくUnifying接続だった。

折りたたみできずタッチパッドのみ付属(ボタンはなし),というのがある。タッチパッド部がモード切替でテンキーとして使えるようで,そこはユニーク。が評価は悪い。

English Keyboard Rii mini K12+/i12+ Wireless Keyboard and K12+ Bluetooth Keyboard with Touchpadは単にタッチパッドが付属したもの。

 

 

What I Wish They Had Told Me about GraphQL

It took me a long time to understand the benefits of GraphQL. The official website summarizes it on the front page as:

A query language for your API

GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.

I had a really hard time trying to understand what this means.

I had to read the following articles to get a better sense of GraphQL:

GraphQL as I See it

The fundamental assumption is that there already exists usable data from the beginning. Let’s call it “the original data.” GraphQL comes into the picture to add another layer with regards to the access to this data, which increases its utility. Potential benefits include: i) precise definition of the accessible data; ii) querying capabilities that yield just the data you ask for and nothing else; and iii) improved performance. The last claim can be true only in specific circumstances, which we will look at later.

One crude analogy could be made here, and that is that GraphQL is like SQL in that it provides data definition and querying, except that GraphQL is only added as an afterthought to a database that is already operational.

There is not any particular restriction or assumption on the means of data fetching between the GraphQL layer and the original data. The original data can reside in local files, local databases, or remote databases — or it can even be simple variables in the GraphQL implementation language (there are many).

Two Parties Involved in Use of GraphQL

The following two parties are involved in the introduction and use of GraphQL to the original data: the GraphQL Manager and the GraphQL User (my own terms, not the GraphQL developers’). Their roles are as follows:

The GraphQL Manager:

  • Defines type-wise structure of the data available for querying, in a schema-like fashion; this will be visible to the GraphQL User by introspection
  • Implements the methods to access the data, in the form of “resolvers”; this will not be visible to the GraphQL User

The GraphQL User:

  • Queries the data within the bounds of the GraphQL query format defined by the GraphQL Manager

Note the GraphQL Manager by himself is not responsible for managing the original data. Even though he “defines” the structure of the data available through the GraphQL interface, this definition does not determine how the original data is actually stored and managed. Rather, he defines what of the original data he wants to be exposed through the GraphQL interface.

Where GraphQL Is Inserted

In the typical scenario where a server holds data and clients request it, GraphQL is usually added on the server’s side, so the server can provide a better interface with querying capabilities to its clients. This seems what is usually assumed in the documents on GraphQL that I have read.

I do not think this has to be the only way, however. You could use GraphQL in a bridge to it. Or you could even use it on the client’s side. As long as the GraphQL Manager can implement the resolvers that fit the given scenario, you should be able to enjoy the benefits of GraphQL.

Increased Performance?

One of the purported benefits of GraphQL is increased performance. However, you have to note it applies only in the context where GraphQL is used to implement a Web API, and its performance is compared to that of a strict REST API (not a Web API in general), and also only when the GraphQL Manager carefully implemented the relevant resolvers in a performant way.

Generally speaking, increased performance is not inherent in GraphQL by itself, despite the impression that the documents on GraphQL might give you; it does not “come for free” just by using GraphQL alone. On the contrary, a naïve implementation of resolvers by the GraphQL Manager is likely to give you pretty bad performance.

The following resources helped me understand this issue:

Performance becomes an issue when the nested objects are allowed to be requested and are indeed requested. Let’s say you want a list of movies that match a certain criteria (say, older than 20 years), with the list of performers for each such movie. Nested objects will be returned to such a query, which are a list of movie data, with a list of performers for each movie. In the relational database terminology, there is either a one-to-many or many-to-many relation between the Movie entity and the Performer entity.

If this query is to be executed against a database server with a REST API, there will be N+1 sub-queries.

  • One query is sent to the /api/movies endpoint, for example, to get the list of N movies
  • One query is sent to the /api/performer/id endpoint for each performer who appeared in each of the N movies above

If the API is implemented in the strict sense of REST, then this is inevitable by its very definition. This is exactly what the REST architecture prescribes. In contrast, there is no such limitation to implementing a GraphQL-based Web API, so you can just provide one endpoint for handling all the GraphQL queries.

In GraphQL, a resolver is implemented for each field of an object type. In the scenario above, if resolvers for [Movie] and Performer(id) are implemented naively, then the there will also be N+1 resolver function invocations and the performance will be just as bad. However, unlike with the REST API whose architectural design decisions make this inescapable, there is room for performance tuning for GraphQL.

I do not know what kind of performance tuning methods are there, but memoization is definitely one of them. But more importantly, since this is such a common occurrence with GraphQL, there are already libraries to mitigate the situation. These include DataLoader (originally from Facebook, just like GraphQL is) and Apollo Data sources (see “ReactとApolloを使ってGithub GraphQL APIを試してみる – Qiita,” “世のフロントエンドエンジニアにApollo Clientを布教したい – Qiita” ). I have not looked into these so I do not know what exactly they do.

Addendum: Apparently I was not the only one to feel odd about that recommended design pattern not being automated. gajus/graphql-lazyloader: GraphQL directive that adds Object-level data resolvers addresses it. Also, “Moving OkCupid from REST to GraphQL” seems interesting.

For application development with React (also from Facebook), there is Relay — the production-ready GraphQL client for React.. A Japanese document explains using Relay with React and GraphQL. It deals with pagination and a similar discussion can be found in “Pagination | GraphQL.”

My Small Grievance about GraphQL’s Notation

In GraphQL’s schema definition, you can define object types and query types, among other things. An object type can have multiple fields (or “slots” in the Lisp nomenclature), and it means the said type is an aggregate of those fields. A query type definition seems to follow exactly the same format with an object type definition except the “query” and “type” keywords. However, the included fields of a query type do not constitute just one pattern of query as a whole.

Take the following query type for example (taken from “Schema basics – Apollo Server – Apollo GraphQL Docs” ):

type Query {
  books: [Book]
  authors: [Author]
}

This query type definition does not mandate that every query be always one for a list of books and a list of authors at the same time. Rather, it means you can query for a list of books or a list of authors, or both. This goes contrary to the naïve aggregate assumption you might get from object type definitions — I certainly did! I do not think this was explained this way in any of the resources I referenced. It took me a long while to realize this.

Each field of a query type definition is an entry point of query, for which the GraphQL Manager has to provide a resolver, as we saw earlier.