2009-12-17

Simplify metaprogramming

To avoid of building extra-complicated solutions it's reasonable to:

* Put to DSL as little as possible.
* Think about any DSL in terms of its AST and keep its syntax as much close to AST as possible. This might allow you to avoid writing and debugging parsers at all.
* Select simple easy-to-learn tools and languages (or subsets of languages) to allow quick joining of new people to the project
* Choose one of "programmable programming languages". I.e. extensible language that directly supports homogeneous metaprogramming3.

2009-12-13

nginx ssl setup : .crt -> .p12 -> .pem + .key

Initial .crt file could be converted to .p12 format with nice open source app portecle. By the way portecle seems to be more convenient than keytool and couple of other tools.

Then you'd better transfer that file to your target server. There you must have opensll installed and the next commands use it.
So you cast those (zzz should be changed of course) to get your key and pem files for nginx https transport:
openssl pkcs12 -nokeys -in zzz.p12 -out zzz.pem
openssl pkcs12 -nocerts -nodes -in zzz.p12 -out zzz.key


Please note that key is not encrypted, so beware (and that's another reason to transfer all .p12 to your target server beforehand).

UPD: well, those commands dump the certificate chain in random order. I was able to use encryption via Firefox, but java SSE did not validate the certificate chain and refused to connect. You'll have to toss the raw text certificates around for SSL to work.

2009-12-12

nginx centos 5.0 easy install howto

Why noone is able to simply list those three magic commands to get nginx installed on centos?..

rpm -ihv http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
rpm -ihv http://centos.alt.ru/repository/centos/5/i386/centalt-release-5-3.noarch.rpm
yum install nginx-stable


Based on extensive info from here.

Another good stab on that done here. Some extra info and preset service wrapper scripts included. RPM-based install flawlessly worked as a service. Maybe you'll need those if you're building from source.

The most useful for me was spell (I usually forget how to update service levels).
chkconfig --level 345 nginx on

Documentation is at wiki.nginx.org and sysoev.ru/nginx.

In general nginx seems (for me) to be quite nice and configurable component - proving that hardcore, low level, careful programming is still a viable and promising alternative.

2009-11-11

Use the brain, not Google

>> I searched whole web and nothing seem to give me
>> at least a pointer so any help will be greatly
>> appreciated.

> For some tasks it is much more efficient to use the brain
> than Google.

Say, have you got a URL? thebrain.com's search engine didn't match "Newton" at all. What am I missing here?

Mercurial backup script

A simple cmd batch to clone and pack a mercurial repo. I used to run that just before logging off.

rmdir /S /Q repo-snap
hg clone -U repo-work repo-snap

7z a -t7z -mx=9 repo-snap-@stamp.7z repo-snap

rmdir /S /Q repo-snap
java -jar jstamper.jar repo-snap-@stamp.7z

shutdown -s -t 120

pause


PS: jstamper.jar was a simple class which renamed file so @stamp was substed with current datestamp. Bash does that in a blink, but there's no simple way to do that in cmd.

2009-10-03

Metaprotocol Taxonomy

Because each application has different communication requirements, there is no simple "best" metaprotocol. Still, some general conclusions are possible. Object-oriented communication, i.e. sending an object model from one machine to another, needs an object-oriented specification (and API where appropriate). The typed metaprotocols (Hessian, RMI, CORBA, JSON) best fit this requirement since they are designed around language types. The syntactic metaprotocols (SOAP, POX) are better suited for document-based applications, i.e. applications which retrieve data through syntactic queries like XPath or XQuery.

The common communication patterns: REST, RPC, Messaging and Streaming also influence the choice of metaprotocol. Since RPC is a typed and API-based pattern, the typed metaprotocols fit better. Messaging applications can use the syntactic metaprotocols if they are document or query based, and can use the typed metaprotocols if they are object model based. Like Messaging, REST can use either typed or syntactic metaprotocols, depending on the applications. Streaming applications like AJAX/Comet fit dynamic typed protocols best.
© Caucho Technology, Inc.; Metaprotocol Taxonomy;

2009-09-13

CDMA Ukraine Setup

CDMA Ukraine setup
Немного в некотором роде смешных ремарок в самом скриншоте. Вывод у меня такой - если вы в черте города, то район и близость соты играют примерно такое же значение, как и высота. Еще замечу, что EDGE у меня в этом же месте выдавал под 16 КБайт в секунду, в то время как CDMA показывал около трех. Задержка канала вроде бы уменьшилась раза в два - хотя сказать что скорость стабильная все равно пока не могу.

2009-09-05

7zip cmd goodies

Some simple magic spells to ease file management under windows. Here goes...
  • compressing all the dirs in the CWD to respectively named 7z archives somewhere else:
    set DEST=C:\some\where\else\note\the\trailing\slash\
    for /D %%i in (./*) do @7z a -mx=9 %DEST%%%i.7z %%i
UPD: more goodies are posted separately with the label 7z

2009-08-19

Open Source Bigtable Analogs

Actually, there's no fully comparable analog to Google's Bigtable - all opensource projects are weaker from performance and stability standpoints. But apparently they are already feasible for managing some unstructured datasets, provided no mission-critical data is involved.

Bellow follows the best 3-minute introduction to the whole concept I've seen online, and then you see some brief product descriptions.
The hardest part about learning HBase (the open source implementation of Google's BigTable), is just wrapping your mind around the concept of what it actually is.

I find it rather unfortunate that these two great systems contain the words table and base in their names, which tend to cause confusion among RDBMS indoctrinated individuals (like myself).

This article aims to describe these distributed data storage systems from a conceptual standpoint. After reading it, you should be better able to make an educated decision regarding when you might want to use HBase vs when you'd be better off with a "traditional" database.


Bigtable

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.


HBase

HBase uses a data model very similar to that of Bigtable. Applications store data rows in labeled tables. A data row has a sortable row key and an arbitrary number of columns. The table is stored sparsely, so that rows in the same table can have widely varying numbers of columns.

A column name has the form "<family>:<label>" where <family> and <label> can be arbitrary byte arrays. A table enforces its set of <family>s (called "column families"). Adjusting the set of families is done by performing administrative operations on the table. However, new <label>s can be used in any write operation without pre-announcing it. HBase stores column families physically close on disk, so the items in a given column family should have roughly the same read/write characteristics and contain similar data.

Only a single row at a time may be locked by default. Row writes are always atomic, but it is also possible to lock a single row and perform both read and write operations on that row atomically.


Hypertable

The Hypertable data model consists of a multi-dimensional table of information that can be queried using a single primary key. The first dimension of the table is the row key. The row key is the primary key and defines the order in which the table data is physically stored. The second dimension is the column family. This dimension is somewhat analogous to a traditional database column. The third dimension is the column qualifier. Within each column family, there can be a theoretically infinite number of qualified instances.

For example if we were building a URL tagging service, we might define column families content, url, and tag. Within the "tag" column family there could be an infinite number of qualified instances, such as tag:science, tag:theater, tag:good, etc. The fourth and final dimension is the time dimension. This dimension consists of a timestamp that is usually auto assigned by the system and represents the insertion time of the cell in nanoseconds since the epoch. Conceptually, a table in Hypertable can be thought of as a three dimensional Excel spreadsheet with timestamped versions of each cell.


Cassandra

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon's Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.

2009-08-13

DLA model parameter space

Here is an image showing how DLA-based fractal model changes when couple of density parameters are slightly tilted. The whole thing is here to generate a fractal probability distribution for geographical Yook/Jeong/Barabasi network model.
Maybe IFS-based fractal would be better here - as it might allow for more control of fractal dimension than DLA does. But then I would not get an obvious way to control exponential density fall-off, and IFS seems to be less natural than this DLA.
Also, here're some resources on fractal generation I'd stumbled upon recently:

2009-07-31

The Maven Way

Doh... maven is a big bunch of hacks. How do I do this, how do I do that. On the first bit of creativity you're slapped with pile of bizarre arguments and/or unpredictable/unexplainable behaviour. Do it the maven way or don't do it at all.

Here I would like to dump occasional memos on magic maven spells.

Dump project dependencies / list dependency tree:
mvn clean install dependency:tree
BTW, why maven needs to build the project to list the dependencies (I guess downloading poms should suffice)? UPD: When you are left with a sandbox which broke due to some library versioning clash it's quite cool: to know which lib is pulling the wrong version you need to fix the problem blindly first. Yeah, that's The Maven Way to do things, you know...

2009-06-28

Virtual Hosts and HTTPs

Using name-based virtual hosts on a secured connection can be problematic. This is a design limitation of the SSL protocol itself. The SSL handshake, where the client browser accepts the server certificate, must occur before the HTTP request is accessed. As a result, the request information containing the virtual host name cannot be determined prior to authentication, and it is therefore not possible to assign multiple certificates to a single IP address. If all virtual hosts on a single IP address need to authenticate against the same certificate, the addition of multiple virtual hosts should not interfere with normal SSL operations on the server. Be aware, however, that most client browsers will compare the server's domain name against the domain name listed in the certificate, if any (applicable primarily to official, CA-signed certificates). If the domain names do not match, these browsers will display a warning to the client user. In general, only address-based virtual hosts are commonly used with SSL in a production environment.

2009-06-12

Programmer Flavors

My father built custom homes, and in my youth I would occasionally work for him, mostly doing grunt labor and sometimes hanging sheet rock. He and his lead carpenter would tell me that they gave me these jobs for my own good -- so that I wouldn't go into the business. It worked.
So I can also use the analogy that building software is like building a house. We don't refer to everyone who works on a house as if they were exactly the same. There are concrete masons, roofers, plumbers, electricians, sheet rockers, plasterers, tile setters, laborers, rough carpenters, finish carpenters, and of course, general contractors. Each of these requires a different set of skills, which requires a different amount of time and effort to acquire. House-building is also subject to boom and bust cycles, like programming. If you want to get in quick, you might take a job as a laborer or a sheet rocker, where you can start getting paid without much of a learning curve. As long as demand is strong, you have steady work, and your pay might even go up if there aren't enough people to do the work. But as soon as there's a downturn, carpenters and even the general contractor can hang the sheet rock themselves.
© Bruce Eckel; A Career in Computing

2009-03-07

J2SE applets stability

Not so long time ago I was doing some research into spurious firefox hangs occuring on applet loading. Looks like sometimes SUN tries to sweep a problem under the carpet. Or even worse - they attempt to patch their way out.

None of these approaches work - as you get either "slightly dysfunctional" components or nasty quantum regressions.

As for the carpet - some of the tickets below are "not reproducible", despite the fact that people are reproducing that massively across multiple bugzillas. See comments for the mojebanka.cz ticket and the last ubuntu ticket too. Also - there're couple of tough tickets with massive votes which once disappeared from public bugzilla - I had a bookmark on that somewhere.

So, for those who firmly believe applets are stable and well-established piece of technology, here goes some plain data (somewhat biased - as this is a bugzilla anyway). Second line is version with delivered fix:

Well, actually, I don't think that applets are completely not feasible, but the browser/applet bridging seems to be somewhat shaky. My opinion is that applets are the most pluggable, secure and functional approach, at least for some of possible usage scenarios.

2009-03-05

Open/Closed Principle : Strategic Closure

It should be clear that no significant program can be 100% closed. For example, consider what would happen to the DrawAllShapes function from Listing 2 if we decided that all Circles should be drawn before any Squares. The DrawAllShapes function is not closed against a change like this. In general, no matter how “closed” a module is, there will always be some kind of change against which it is not closed. Since closure cannot be complete, it must be strategic. That is, the designer must choose the kinds of changes against which to close his design. This takes a certain amount of prescience derived from experience. The experienced designer knows the users and the industry well enough to judge the probability of different kinds of changes. He then makes sure that the open-closed principle is invoked for the most probable changes.
© Robert Martin; The Open-Closed Principle;

2009-02-28

ScheduledThreadPoolExecutor : gotcha

This code
final int period = 1000 /*millis*/;
final int delay = 10 /*millis*/;
final int num = 20;

ScheduledExecutorService ses = 
    Executors.newScheduledThreadPool(4);

for (int i = 0; i < num; i++) {
  ses.scheduleWithFixedDelay(

    new Runnable() {
      public void run() {
        try {
          Thread.sleep(num * 1000 - 10);
        } catch (InterruptedException ie) {
          throw new RuntimeException(ie);
        }
        System.out.println("tick: " + System.currentTimeMillis());
      }
    },

    i * period, delay, TimeUnit.MILLISECONDS
  );
}

Thread.sleep(Integer.MAX_VALUE);
should produce 20 tasks which in total would tick once a second. But nope. You would get only 4 tasks, as corePoolSize does not expand if pool is unable to handle all scheduled tasks. The actual output would be something like:
tick: 1235810468265
tick: 1235810469237
tick: 1235810470278
tick: 1235810471280
tick: 1235810488274
tick: 1235810489236
tick: 1235810490277
tick: 1235810491299
The only hint for this is left in the javadocs for the ScheduledThreadPoolExecutor:
While this class inherits from ThreadPoolExecutor, a few of the inherited tuning methods are not useful for it. In particular, because it acts as a fixed-sized pool using corePoolSize threads and an unbounded queue, adjustments to maximumPoolSize have no useful effect.

2009-01-22

Design Pattern : Good Citizen

Imagine a software system where there is no need for you to spend your time programming defensively; your objects will be used responsibly, and your methods will always be passed sensible arguments. This low-friction utopia can be approached by establishing some simple programming rules so that every class acts as a 'good citizen' in the society of classes collaborating at runtime. This page outlines some rules that we, and others, believe lead to good citizenship. All are aimed at improving clarity, reducing surprise, and promoting basic consistency. As a good citizen, I...
  • Keep a consistent state at all times - init() or populate() is a code smell.
  • Have no static fields or methods
  • Never expect or return null.
  • Fail fast - even when constructing.
  • Am Easy to test- all dependent object I use can be passed to me, often in my constructor (typically as Mock Objects).
  • Accept dependent object that can easily be substituted with Mock Objects (I don't use Concrete Class Dependency).
  • Chain multiple constructors to a common place (using this(...)).
  • Always define hashCode() alongside equals()
  • Prefer immutable value objects that I can easily throw away.
  • Have a special value for 'nothing' - e.g. Collections.EMPTY_SET.
  • Raise checked exceptions when the caller asked for something unreasonable - e.g. open a non-existant file.
  • Raise unchecked exceptions when I can't do something reasonable that the caller asked of me - e.g. disk error when reading from an opened file.
  • Only catch exceptions that can be handled fully.
  • Only log information that someone needs to see.
© Dan North, Aslak Hellesoy; found at PicoContainer Design Patterns

2009-01-19

Changing Windows XP HAL

If you perform a Windows installation with default settings in VirtualBox, Halacpi.dll will be chosen as VirtualBox enables ACPI by default but disables the IO APIC by default. A standard installation on a modern physical PC or VMware will usually result in Halaacpi.dll being chosen as most systems nowadays have an IO APIC and VMware chose to virtualize it by default (VirtualBox disables the IO APIC because it is more expensive to virtualize than a standard PIC). So as a first step, you either have to enable IO APIC support in VirtualBox or replace the HAL. Replacing the HAL can be done by booting the VM from the Windows CD and performing a repair installation.
© VirtualBox Wiki; Migrate Windows

2009-01-09

Java Performance Benchmarks

Several benchmarks comparing C++ and Java exist on the web. The results are mixed: some show that Java is actually faster than C++, while most show that C++ is still faster than Java but by a small margin. The purpose of this post is to talk about the theory behind the Java optimization techniques, so I didn't set out to create my own benchmarks myself, but nothing like hard data to prove a point. So here are the links to some benchmarks found on the web:
  • The Java is Faster than C++ and C++ Sucks Unbiased Benchmark: despite the name, this benchmark shows very similar results between Java and C++, with the occasional scenario where C++ beats Java hands down.
  • The Java Faster than C++' Benchmark Revisited: someone who didn't like the benchmark above and found different results, where C++ has a clearer lead. Even so, Java is still close, wins some benchmarks, and is clearly slow only in a handful of tests.
  • The Computer Language Benchmarks Game: compares a number of programming languages using different algorithms. Gnu C++ and Java 6 are compared, and C++ wins most of the comparisons, but in most of the cases by a very close margin, and Java is the occasional winner in some of the tests.
© Domingos Neto; Java Performance
...and one more update: Stefan Krause; Update for Java Benchmark.