Recent Updates 2013

02/20/2013

I now support the alternate syntax for CASE expressions. Apologies to anyone using the Case class. This class has been renamed to MatchCase. There alternate CASE syntax is handled by the ConditionalCase class. You use it like this:

ConditionalCase options = new ConditionalCase();
options.AddCaseOptions(new NullFilter(table.Column("Name")), new StringLiteral("<NULL>"));
options.Default = table.Column("Name");

This will generate SQL that looks like this:

SELECT 
CASE 
    WHEN Table.Column IS NULL THEN '<NULL>' 
    ELSE Table.Column
END

I am looking for other odd-ball standard SQL that I may be missing. I see the CAST function, but it involves parsing type names, which is quite involved. In the not-so-near future, I may think about supporting Data Definition Language (DDL) in additional to Data Manipulation Language (DML), but that would easily double the size of the current project and isn't needed for everyday programming. How often do you need to create a table, stored procs, etc. dynamically on-the-fly with a different schema depending on user input?

02/18/2013

CASE Expression

There's a CASE expression supported by just about every vendor. It looks something like this:

SELECT DayOfWeekId,
CASE DayOfWeekId
    WHEN 1 THEN 'Monday'
    WHEN 2 THEN 'Tuesday'
    WHEN 3 THEN 'Wednesday'
    WHEN 4 THEN 'Thursday'
    WHEN 5 THEN 'Friday'
    WHEN 6 THEN 'Saturday'
    WHEN 7 THEN 'Sunday'
    ELSE 'Unknown'
END AS DayName
FROM DayOfWeek

CASE expressions can appear as SELECT items, in filters and just about anywhere else. Most of the time, I've not used them, but some times they are absolutely necessary for being platform independent or doing some forms of reporting.

In order to represent the CASE expression, there is a new Case class. The following code could regenerate the SQL above:

SelectBuilder builder = new SelectBuilder();
AliasedSource dayOfWeek = builder.AddTable(new Table("DayOfWeek"));
builder.AddProjection(dayOfWeek.Column("DayOfWeekId"));
Case options = new Case(dayOfWeek.Column("DayOfWeekId"));
options.AddCaseOption(new NumericLiteral(1), new StringLiteral("Monday");
...
options.AddCaseOption(new NumericLiteral(7), new StringLiteral("Sunday");
options.Default = new StringLiteral("Unknown");

Still Looking into Name Resolution

I am still researching how I want to identify identifiers as column names or aliases. I want to make sure that the same instance is returned when referring to the same name. This will probably mean extending the AliasedSource class to be a flyweight. It will need to be smart enough to handle duplicate names (throwing ambiguity errors) and partially qualified names. Once I have that, I should be able to reuse the scoping already present in the command builder code to find the first item matching the name.

In places, I know I am retrieving a column, such as in an INSERT statement's column list or the left hand side of a SET. In other places, I could be referring to a Column, an AliasedProjection or even a Placeholder if I have no idea what something is.

To make things more complicated, I am building the SelectBuilder so I don't necessary have all the AliasedProjections yet. I am going to have to grab all the aliases and columns first and then go back and process sub-selects and windowed functions. There's a good chance the complexity will result in code that makes me want to barf. I'll try my best.

02/16/2013

Window Functions

Today I added support for analytic functions. Personally, I'm not overly familiar with them. I did some really intense studying to learn the cumbersome syntax. I'm still struggling to think of times where I'd actually use some of the features. What I support is a little more broad than just analytic functions - something called "window functions". Like usual, my library won't check that you're using supported features with certain functions. This philosophy keeps my code simple, allows for unanticipated vendor-specific implementations and, most importantly, saves me lots of time.

What motivated me to support window functions was paging. I've been thinking about supporting MySQL's LIMIT/OFFSET or SQL Server's new OFFSET/FETCH. I probably still will at some point. There's really not a standard way of implementing paging across database platforms. In older T-SQL systems, you had to work with a window function called ROW_NUMBER. Here's one of those rare places where Oracle makes things easier. Paging is common enough that I want to support it. I already support TOP for the same reason, even though it is T-SQL specific. This is one of those times that makes me think I need to start breaking out my SQL grammar definition. I would prefer someone else be able to help me do that. It's not something I want to spend a lot of time doing if the project doesn't get widely used. Just remember, every time I modify the grammar, I have to update the Formatter and the CommandBuilder.

Simplified Token Recognition

I also simplified the code for tokenizing text. For converting builders into to SQL, I use the Formatter class. In this case, the text is pre-tokenized. All I need to do is identify what type the tokens they are. Technically, I know this information when creating the tokens and I will probably just change the code to pass it along at some point. It's important to keep in mind that there are well over 100 pieces of code generating tokens at this point - so I need to invest a lot of time to make even a minor change like that. When parsing SQL with CommandBuilder, I need to tokenize a string. Both ways spit out the token type and value. I was able to simply "imbue" pre-tokenized tokens with the token type. I also made it the responsibility of the calling code to check whether the token is the right type, which really simplified the code.

Discerning Identifiers

I am a little perplexed at the moment. When I find an identifier in the midst of a SQL statement, it could be an unqualified column, and alias, a parameter or something else entirely. I can't really tell... that's the problem. The window functions made me realize another issue: references to columns within an OVER clause can be aliases defined for other columns in the same SELECT. With ORDER BY, I had the advantage of already knowing all of the columns selected. Also troubling, SQL allows you to project multiple columns with the same name, so there's ambiguity when searching for a column. Now, this hasn't been an issue because the Formatter really doesn't care. However, you may run into problems if you try to find a Placeholder that is silently converted into a Column or search for an aliased column and get a new, unqualified, column instead. I'm not sure how to solve this problem without some sort of schema information. One idea I had was to make ambiguous items Placeholders, tracking as many aliases as I can. The aliases might be difficult to track because they come into existence at arbitrary times.

02/12/2013

I was able to quickly finish the Command Builders portion of the code. I ran into a few problems, but nothing I couldn't overcome.

It makes me hideously aware how badly I need a better Lexer/Parser. I had to do some serious tweaking to boost performance. Think about it: your database is optimized to parse hundreds of SQL a second. My parser is struggling to handle just a few in the same amount of time. I guess that's the difference between recursive descent and LALR! A little look-ahead would go a long way to avoid trying unlikely expressions. I also think I could benefit from a more "exact" implementation of the SQL grammar. Honestly, though, standard SQL has a lot of garbage that I've never seen anyone use. There are a lot of issues dealing with multiple vendors. At some point, it might make sense for me to define a separate grammar for each vendor, which is why I architect-ed it the way I did.

02/11/2013

This project is pretty close to going from "useful" to "really awesome". Today, I got really close to finishing the Command Builders portion of the code. This will allow you to convert SQL into SelectBuilder, InsertBuilder, UpdateBuilder and DeleteBuilder.

I also learned some places where my SQL grammar wasn't correct. I wasn't applying right precedence to parentheses, which meant that logical expressions (ANDs and ORs) and arithmetic expressions (+, -, *, /) weren't working as expected in every case. I also wasn't correctly support NOT and negation (-(1 + 2)). I am adding a class to allow for negating an arithmetic expression. This is actually somewhat tricky because I need to be careful about generating parentheses (so they don't double up).

I also had a few minor bugs with the way joins were being wrapped in parentheses (in case you're using MS Access).

There are still some open questions about how to properly qualify an unqualified column. In some cases, you can infer which table a column belongs to because it is the only table in the statement. In other cases, I really have no way of telling. So, I just do my best to figure out which table it belongs to. When I say table, I really mean "table-like" thing. If you don't see why this is complicated, think about the case when a SELECT contains another SELECT.

I think I will create a Window class that will be sub-classed to support things like MySQL's LIMIT/OFFSET feature. I will probably try to implement this for a few databases. Although, some database's approach to this feature are beyond wonky.

I am planning on extending the Formatter class, adding in events that will fire before and after each expression, allowing you to generate any custom text you want. This could be useful for adding a WITH clause to the end of a join or plugging in a really odd piece of vendor-specific SQL.

02/09/2013

I successfully rewrote the formatting code. I decided that I don't care if I can spit out nicely formatted SQL (for now). If you want to see your code formatted, you'll have to use another tool. The nice thing is that I was able to write a decent unit testing suite, so I am feeling much more confident about my results.

Today I added support for EXISTS, ALL, ANY and SOME. EXISTS is pretty common (because it is known to boost performance in some cases) but the other three are fairly rare. I just learned about them today, in fact. All of these could have been implemented in terms of a function call (almost) but I wanted to directly support them.

I also learned that UNION, INTERSECT, EXCEPT and MINUS (Oracle) support DISTINCT/ALL qualifiers. This means that "UNION ALL" is just a "UNION" with the "ALL" qualifier. Who knew? For the same reason, SELECT's IsDistinct property was replaced with a Distinct property, whose value is an enum (DistinctQualifier). The same enum is shared across all these classes.

Next, I am going to focus on making it possible to create command builders from SQL. This will allow you as a developer to write the static parts of your SQL in plain text and convert it over to a command builder. From there, you can search the object model and make your modifications. Once you make your change, you can convert the command builder back into SQL. It's really cute, to be honest. Here's an example:

string commandText = "SELECT * FROM Customer";
CommandBuilder builder = new CommandBuilder();
SelectBuilder select = (SelectBuilder)builder.GetCommand(commandText);
Column column = select.Sources["Customer"].Column("customerId");
Placeholder customerIdParameter = new Placeholder("@customerId");
EqualToFilter filter = new EqualToFilter(column, customerIdParameter);
select.AddWhere(filter);
Formatter formatter = new Formatter();
commandText = formatter.GetCommandText(select);

In this example, we're just adding a filter to our SELECT statement.

Next on the agenda is allowing commands to be run against a DataSet. This will let you write SQL that can retrieve and update data in a DataSet. This will allow you to use the same code for hitting a real database as well as a DataSet, which could be useful if you're testing code or wanting to switch between online and offline modes.

This library is fairly close to supporting 100% standard SQL plus some extensions for popular database systems. Initially, I wasn't sure if writing a SQL parser was really necessary. However, I see now that the parser really opens up some possibilities. Had I not gone that route, it would have never occurred to me to convert SQL into command builders. I wonder about the performance characteristics. Compiling code at runtime has to have some significant overhead. I still think the overhead is less than the time it takes for a database hit to take place, so we're in good shape. Plus, I am running nearly 100 unit tests of varying complexity and I haven't seen anything take longer than a few milliseconds.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly