XML processing with XQuery (part 1)

A few weeks ago, one of my friends came to me with the following problem. He was using a software music player that was storing his music library and playlists in a XML file and he wanted to do some extractions and transformations.

Here’s how the XML looks like

<song id="1" artist="Sarah Mc Lachlan" title="Fallen" /> <song id="2" artist="Moby" title="Extreme Ways" /> <song id="3" artist="Jewel" title="Becoming" /> <playlist> <song id="2"/> <song id="3"/> </playlist>

XML is supported natively by the engine like a standard datatype, so you can use XML in any place where an atomic type is allowed. There are some restrictions but it's beyond the scope of this blog entry. Let's put it in a variable. Later we can use a table if we want. For quick testing, a variable will do fine.

declare @x xml set @x = '<song id="1" artist="Sarah Mc Lachlan" title="Fallen" /> <song id="2" artist="Moby" title="Extreme Ways" /> <song id="3" artist="Jewel" title="Becoming" /> <playlist> <song id="2"/> <song id="3"/> </playlist>'

Our goal is to resolve the song ids in the playlists and produce this output

<playlist> <song id="2" artist="Moby" title="Extreme Ways" /> <song id="3" artist="Jewel" title="Becoming" /> </playlist>'

We'll use the XQuery language to achieve that. If you like skipping all the steps and go straight to the solution, here it is:

select @x.query('<playlist> {for $id in /playlist/song/@id 
   return <song artist="{/song[@id = $id]/@artist}" title="{/song[@id = $id]/@title}"/>} </playlist>')

But the journey is often more rewarding that the solution itself and this trip will take us through various areas of the product like:

XQuery/XPath
Binary XML
Relation+ algebra
UDX
and many other things

So let's get started.

In SQL Server, the support we have for XML is based on the XML data model (XML 1.0 standard). Our XML is actually a XML fragment since it doesn't have a prologue (the <?xml ...> header) but that's fine for our purpose. This model organizes the content in a tree where elements are containers, attributes belong to elements and text nodes are leaves.

Because SQL Server works on the XML data model content, it allows for easy navigation of the tree. For instance, you can refer to the third song from the top or the first song inside the playlist using a language called XPath. Also for this reason, the database engine will not guarantee strict character by character preservation. In fact, in many case if you retrieve the XML fragement, it will not be lexically identical to the XML you inserted. We drop the XML prologue and we can reorder the attributes for a given element. Insignificant whitespaces are also dropped. There is usually a bit of a confusion regarding what is an insignificant whitespace. It's worth taking some time here to clarify.

Insignificant whitespace and whitespace entitization

Insignificant whitespace is whitespace (by whitespace, we denote space, tab, line feed and carriage return) only text nodes. The key points are whitespace only and text nodes . Let's see with some examples.

<a>
	<b/>
</a>

That's equivalent to

<a><b/></a>

But if you some non white space then the whole node becomes significant

<a>
	Hello
	<b/>
</a>

There you cannot drop anything. You can't remove whitespace inside attribute values either because it's not a text node and if it is contained in an element (directly as a parent or somewhere as an ancestor) that has xml:space='true', then it has to be kept too. Sometimes we need to entitize some characters because otherwise they would be transformed into a space.

In attribute values, we entitize tabs (	), line feeds (&#x0A) and carriage returns (&#x0D). This section of the XML spec explains the rules behind this.

In text nodes, we only entitize carriage returns. We obviously have to entitize all the 'special' characters such as &, <, >, etc. which have a particular meaning in XML.

Some applications require a stricter preservation of whitespaces. Word processors that save their documents in XML form are interested in keeping all the whitespaces. SQL Server provides an option to indicate this. When you insert the XML, there is an implicit conversion from text to XML.

set @x = '<a/>' is equivalent to set @x = convert(xml, '<a/>')

The advantage of this syntax is that you can specify additional options to the convert operation. With option 1, you indicate strict whitespace preservation

set @x = '<a>             </a>'
select @x

returns

<a/>

but,

set @x = convert(xml, '<a>           </a>', 1)
select @x

returns

<a>          &#x20;</a>

Note that we entitized the last whitespace character so that if the same string is reinserted into the server, the whitespace will be preserved even if the option 1 is not present. It is important in replication scenarios where this XML is shipped to a remote server. We want to keep the replicated version identical to the original one.

Next week we'll talk about how the server stores the XML and XPath.

posted by hanh | 0 Comments

Thursday, April 13, 2006 3:04 PM

An article on FOR XML in Microsoft SQL Server 2000 and 2005

I just posted an article What does server side FOR XML return? (http://blogs.msdn.com/sqlprogrammability/articles/576095.aspx) which gives some details of the design and performance characteristics of various FOR XML flavors.

I also added links to XML whitepapers and publications on the design of XML features in Microsoft SQL Server 2005 to the links section of the main SQL Programmability & API Development Team Blog page (http://blogs.msdn.com/sqlprogrammability/default.aspx).

Best regards,

Eugene Kogan

Technical Lead

Microsoft SQL Server Relational Engine

posted by ekogan | 0 Comments
Filed Under: XML, SQL Server 2005, SQL Server 2000

Saturday, April 08, 2006 6:01 PM

Strong named assemblies and AllowPartiallyTrustedCallers

Very often, you may wish to factor out your code into separate assemblies in your application. For example, you separate your type in one assembly because it gets used by multiple areas in your application. And you save your proc that uses that type in another assembly.

So Let's say you have two CLR assemblies. Assembly A contains a UDT. Assembly B has functions which use the UDT in their function signatures and/or in code inside the function. Assembly B references A.

A problem occurs when they are strong-named. When a function in B is called this error is thrown:

Msg 6522, Level 16, State 1, Line 1
A .NET Framework error occurred during execution of user defined routine or aggregate 'MyProc':
System.Security.SecurityException: That assembly does not allow partially trusted callers.
System.Security.SecurityException:
at System.Security.CodeAccessSecurityEngine.ThrowSecurityException(Assembly asm, PermissionSet granted, PermissionSet refused, RuntimeMethodHandle rmh, SecurityAction action, Object demand, IPermission permThatFailed)

What's going on ?

You tried writing your own executable with the same callers that use the UDT in A and it works! but it doesn't within the server and you are wondering whether its a bug in the server ?

Well this is actually a CLR design. MSDN says that Types in strong-named assemblies can be called by partially trusted code only when the assemblies have been marked with the AllowPartiallyTrustedCallers attribute (APTCA). It adds that this attribute essentially removes the implicit LinkDemand for the FullTrust permission set that is otherwise automatically placed on each publicly accessible method in each type. Please note the word "FullTrust" in CLR refers to unrestricted access or 'unsafe' permission set by sqlclr definition.

So,

When assembly B calls strong-named assembly A,

Either A must have AllowPartiallyTrustedCallers attribute Or B must have unsafe (FullTrust) permission set.

What does this mean ?

It means that CLR enforces the callers of a strong named assembly to be fully trusted unless that assembly has the APTCA attribute. If you add another assembly C that calls the functions in B, then B should have APTCA attribute or C should be unsafe. We recommend that if your assemblies in the database are SAFE/EXTERNAL_ACCESS (not FullTrust / partial trust / restricted), you need to mark them with APTCA to support cross assembly calls.

The reason your executable worked is, by default every assembly on you local drive gets FullTrust. If you copy your assemblies to a network share and try to run it from your local machine, you will see the same error, because default CLR policy will not give FullTrust to assemblies on remote shares.

Here's the history on this requirement for sqlclr assemblies. CLR introduced the APTCA model for providing protection for the assemblies that are shared across multiple applications. Typically such assemblies were put in the GAC by applications. Assemblies that are NOT shared are usually in the Application Path. Any application that did not have unrestricted access(FullTrust/unsafe) could only load only those assemblies that are in GAC or those found by probing in the location specified by the AppDomain's APPBASE property (ApplicationPath / Database).CLR loader guaranteed that and provided protection against restricted assemblies from loading arbitrary assemblies. In the case of SQL Server, the hosting hooks guaranteed similar control on what gets loaded in its appdomain. But there was no corresponding mechanism for protecting shared assemblies ( such as in the GAC ) from being called by arbitrary assemblies. APTCA enforced that by either requiring the caller to have unrestricted access or requiring that those shared assemblies opt to allow arbitrary caller. Using the attribute implied that the shared assembly would be audited / reviewed and then deemed to be safe for use from any calleer.

The problem with APTCA is that CLR checks for it in *every* assembly instead of restricting the check to shared assemblies. Further, CLR could have skipped checking this attribute for assemblies in the ApplicationPath / Database which can be considered as NOT shared. For example, this relaxation makes perfect sense for SAFE assemblies since have only execute permissions. They could have been considered implicitly APTCA and they could have been allowed to call each other. CLR allows the host to dictate permissions for the assemblies but does not allow the host to control APTCA behaviour.

The workaround we suggest therefore is to mark your SAFE/EXTERNAL_ACCESS assemblies with APTCA attribute in order to allow them to talk to each other. Please note that specifying this attribute in sqlclr assemblies does not require any audit/review and can be considered the norm to allow cross assembly calls. So we recommend that. We are not changing the rules here but specifying its relevance in SQLCLR. The directive for audit / review continues to remain valid for assemblies in the GAC that are shared between multiple other applications. In addition, we would not recommend granting unrestricted access to the callers of sqlclr assemblies. That will be an extreme measure.

posted by RaviR | 0 Comments
Filed Under: SQL CLR Hosting, SQL Server 2005

Friday, April 07, 2006 8:46 PM

Server Side Error Handling - Part 3 (Why do I still see error messages from inside tsql try-block?)

There are cases when you see error messages relayed from inside the try-block. Some are intuitive and some are not.

Compile Errors - These errors can simply not be caught. This is true for any programming language. When compilation of a program generates an error it can not be caught by the enclosing try-block. As the batch has not started execution, there is no way BEGIN TRY would take affect. I am always asked how I differentiate between compile and non compile errors. Some errors like syntax errors are obvious, some are not. One way to figure it out is to put a PRINT statement immediately after the BEGIN TRY statement. If the message is not printed and you see an error message, it is a compile error and can not be caught.

BEGIN TRY
   PRINT 'Inside Try-Block'
   SELECT ** FROM T /* will cause syntax error */
END TRY
BEGIN CATCH
   SELECT error_number() as 'Error In Try Block'
END CATCH

A compile error generated in the lower scope would be caught. This is because the compile of the lower scope happens during the execution of the enclosing scope (more specifically after the execution of BEGIN TRY).

BEGIN TRY
   PRINT 'Inside Try-Block'
   EXEC ('SELECT ** FROM T ) /* compile error in the lower scope */
END TRY
BEGIN CATCH
   SELECT error_number() as 'Error In Try Block'
END CATCH

  Note that in this case you will see the 'Inside Try-Block' message getting printed.

Recompile Errors - They are raised when a statement recompiles because of change in schema, environment, statistics, etc. They are similar to compile errors, except that they would be raised after BEGIN TRY has executed. We debated on weather to catch such errors or not. Finally we decided against catching them to keep it consistent with compile errors. The decision was also influenced by the fact that on recompile we exit the execution driver in the SQL Server and enter the compilation module to recompile the statement. This behavior is not guaranteed in future releases. We may decide to catch these errors as the BEGIN TRY was executed and the handler was activated.
Remote execution of a Procedure - When a procedure/batch (pass-through) is executed remotely (through four-part name or EXECUTE AT), the error handling on the local server depends on the type of remote server and the error action at the remote server if it is a SQL Server.

If the remote server is not a SQL Server, it is difficult to parse the error message for its processing at the local server. Such error message from the remote server is relayed to the client without any processing at the local server. The local server though detects that an error happened on remote execution and it would raise a generic error. This generic error will be handled by the local tsql try-catch.
If the remote server is SQL Server, the local server tries to find out if the remote execution resulted in batch abort or higher. If remote execution was aborted then the error causing the abort can be handled at the local server. All previous error messages are relayed. In the absence of batch abort, all error messages are relayed to the client without any processing at the local server. This behavior is consistent with @@error behavior in prior release of SQL Server. @@error was set if and only if the remote execution was aborted. We would like to have the same behavior for non-SQL Servers, but the oledb does not have any API to detect the error action at the server.

Attentions - Attentions are sent when a running query is cancelled. The server detects an attention and aborts the execution. There was a considerable debate on whether to catch attentions or not. Finally it was decided not to catch them (except for DTC attentions). Attentions (excluding DTC attention) do not result in an error message. If they were caught the error intrinsics would not be populated inside the catch block and no logic based on error number could be written.

This concludes my talk on server side error handling. If you have any questions or comments please feel free to send them my way.

Thanks

posted by NaveenP | 0 Comments
Filed Under: SQL Server 2005, SQL Exception Handling

Friday, April 07, 2006 12:21 AM

Sorting undefined characters in Unicode and/or Windows collation

When comparing two Unicode strings in non-binary collations, SQL Server uses a library that is essentially same as the Windows API CompareStringW. It defines a weight for each recognized character, and then use the weights to compare characters. However, not all code points are defined in the sorting library. They may be undefined because:

1) The code point is not defined in Unicode standard.
2) The code point is defined in Unicode standard, but not defined by Windows yet. It takes time and effort to define linguistic sorting semantics for new characters. Windows team typically needs to work with local standard body and/or regional PMs to define sorting rules for new characters. They add new character support in every release, and try to catch up. Some characters may have font defined, therefore could be correctly displayed, but still not defined in terms of comparison. For example, NCHAR(13144) - NCHAR(13174).
3) The code point is defined in Windows, but not defined in SQL Server yet.

Windows NLS team has decided that undefined characters are ignored during string comparison, partly because there is no real good way to compare them against other defined characters. SQL Server inherited this semantics. This does cause some confusing behavior. See below examples.

declare @undefined_char1 nvarchar(10), @undefined_char2 nvarchar(10)
set @undefined_char1 = nchar(0x0000)
set @undefined_char2 = nchar(13144)
select 'Undefine characters compare equal to empty string'
where @undefined_char1 = ''
select 'All undefine characters compare equal'
where @undefined_char1 = @undefined_char2
go

create table t (c nvarchar(10))
go
create unique index it on t(c)
go
-- first insert succeeds, but second insert fails with duplicate key error.
insert t values (nchar(0x0000))
insert t values (nchar(13144))
go

As you can see, since all undefined characters compare equal, they could cause duplicate key errors. Similarly, if you create one table with name of an undefined character, and then try to create another table with another undefined character, the second table creation would fail due to duplicate names, even though the code points of the two undefined characters are different. This could also cause confusing results in string matching builtins such as CHARINDEX, PATINDEX, or LIKE.

While these results seem confusing, the basic rule is actually very simple, i.e., undefined characters are ignored during comparison and string matching. Once you understand and remember this rule, the behavior should be easy to understand.

There have been arguments whether undefined characters should be ignored. Since this is the behavior on Windows platform, and there is not a definitely better way to sort them, and for backwards compatibility, we are going to maintain this behavior.

If your app needs to work with these undefined characters and expect to treat them as regular characters, you can use binary collation. In binary collation, comparison is done purely based on code points, not linguistic rules, so there is no notion about defined vs. undefined.

posted by Jun Fang | 0 Comments
Filed Under: Sorting, SQL Server 2005, SQL Server 2000, Native Types and Methods

Tuesday, April 04, 2006 12:06 AM

SQL Server Identity column FAQs

Q. How is identity values generated? Does SQL Server internally use lock as synchronization mechanism?

A. Identity values are generated in-memory using light-weight synchronization, i.e. Spinlock (roughly speaking, Interlocked* function with yield).

When an identity-value is either generated or claimed for use by DML (insert), a log record is used to track that value (OR one greater than it) has been used, before the use of the value is committed. Some exception includes BCP (bulk insert), which gets or uses these values without logging but is then responsible to do a logged value-add (into identity manager) before the BCP is committed. These log records are used during recover to recover the greatest value given out for the sequence.

Q. I understand that Identity value generation does not incur lock by SQL Server, how come I experience lock contention between concurrent inserters?

A. Quite often, the lock contention is Storage Engine page latch/lock contention. Suppose session A and session B are both inserting at the same time to the same identity column, and the identity column is primary or unique key, i.e. an index is built on it. Both sessions will get nearly the same identity values. Therefore they will probably write access the same page in the index. That means there will be contention, and that is natural. There are a couple of ways to handle this. Following list some simple approaches.

1. Only one inserter. The monotonic sequencing of identity value is beneficial. The last index page is likely to stay in Buffer pool, so transaction will be faster.

2. Introduce unique value (or random number generator) that will be specific to a session, such as user name, node id, etc. Concatenate this with the identity value as the index key.

E.g.

Create table table_foo(

order_id integer identity,

session_id integer,

primary key (session_id, order_id))

I also see some customers using identity column to simulate standard Sequence object feature, where a table is used to generate sequence values. Note row-level lock on inserted/deleted rows is incurred here, and possibly page lock (if index is added on the identity column without much thought).

posted by andrewz | 0 Comments
Filed Under: Misc. Issues, SQL Server 2005

Monday, April 03, 2006 11:02 PM

Name resolution, default schema, implicit schema Part IV

This post will talk about implicit schema used in compile plan and conclude the topic of Name resolution, default schema, implicit.

A compile plan may refer to non-qualified schema object. It needs to record that fact an “implicit” schema is used (Name resolution detect such scenario and default schema is recorded in the compile plan, which is used as part of the plan cache lookup key). Batches with unqualified object names result in non-reuse of query plans. For example, in "SELECT * FROM MyTable", MyTable may legitimately resolve to Alice.MyTable if Alice issues this query, and she owns a table with that name. Similarly, MyTable may resolve to Bob.MyTable. In such cases, SQL Server does not reuse query plans. If, however, Alice issues "SELECT * FROM dbo.MyTable", there is no ambiguity because the object is uniquely identified, and query plan reuse can happen. (See the uid column in sys.syscacheobjects. It indicates the user ID for the connection in which the plan was generated. Only query plans with the same user ID are candidates for reuse. When uid = -2, it means that the query does not depend on implicit name resolution, and can be shared among different user IDs.)

Refer to Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 for expanding on implicit schema impact on query plans. http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

posted by andrewz | 0 Comments
Filed Under: Misc. Issues, SQL Server 2005

Monday, April 03, 2006 10:42 PM

Name resolution, default schema, implicit schema Part III

It is time to cover name resolution algorithm in SQL 2005 for objects.

[Algorithm]

In a nutshell, objects name resolution follows the same rule as types and XML schema collections. However, SQL Server name resolution and execution context used to have special behaviors in SQL2000 as outlined in Section I, not to mention the fact that system objects are now migrated to resource database, and user-schema separation feature is introduced. Clearly name resolution algorithm needs to be more complex, with the following goals (more details later):

Avoiding luring attack
Minimizing user’s name space hijacking
Backward compatibility
Unifying algorithm with types and XML schema collections
Acceptable performance

For DDL access, including CREATE/DROP/ALTER object, the algorithm is the same as types and XML collections. Attached flow chart covers other scenarios for 1-part or 2-part names.

If server name is specified, the object reference is resolved by the specified remote/linked server.

If database name is specified, the object is resolved in the specified database context. (For system objects which are visible in every database, it means compile/execution database context.) It is worth to mention the case when schema name is omitted though. Under this circumstance, the active default schema is defined as following:

For object references outside SQL modules, or DDL, dynamic SQL inside SQL modules, current user’s default schema in the specified database is used.
Otherwise, the module schema owner’s default schema in the specified database is used.

E.g. suppose user bar owns schema foo in database nwind, when pubs..tab1 is resolved, default schema of the user in pubs that maps to the same login as bar (or guest) is the active default schema.

use nwind

go

create procedure foo.proc1

as

select * from pubs..tab1

go

[Avoiding Luring Attack]

Consider following scenario system admin uses an unqualified name reference to a system procedure from an arbitrary database/schema context.

use database1

go

exec sp_addlogin 'somelogin', 'somepassword'

go

We recommend users to qualify with schema “sys” when referencing system procedures in SQL2005. However, existing applications may well rely on SQL 2000 name resolution special behavior to bind system procedure (with sp_ prefix) tighter than user procedures. If we did not force a “sys first” strategy, the unqualified name reference shown above would bind to dbo.sp_addlogin in database1. A “malicious dbo” could then get a system admin to execute arbitrary code.

[Minimizing User’s Name Space Hijacking]

If schema not specified or dbo is specified, look in sys schema, any of –

1) prefix ‘sp_’;

2) prefix ‘fn_’, ‘xp_’ and master context if dbo schema specified;

3) SQL 2K db-scoped catalog name;

4) SQL 2K server-scoped catalog name and db-context is master.

As illustrated in the algorithm logic, system objects are resolved first to avoid luring attack. Unfortunately, it is hijacking user’s namespace, i.e. if Microsoft SQL Server ships system stored procedure sp_foo in SQL2005 or future releases, user stored procedure with same name, which is contained in dbo schema, can no longer be used.

The strategy is for system objects to use prefix naming standards, and requiring users to qualify with schema otherwise, such as catalog views, Information-Schema views and dynamic management views. End users are discouraged to use MS-reserved name prefixes “sp_/fn_/xp_”.

We may introduce SQL-path or similar feature in next release to improve this, as well as address user’s desire to choose different name resolution search order in general.

[Backward Compatibility]

As we can tell from the algorithm, the special behavior of SQL 2000 name resolution with respect to system objects is preserved. There is subtle difference between SQL2005 and SQL 2000 with respect to collation used to match names though.

In SQL2005, system object names are matched using collation of current context database. This means if you are using case sensitive collation database, system object name must be exactly matched. E.g. “exec SP_help” will fail to resolve as system procedure sp_help in database with Latin1_General_BIN collation.

The reason is that system objects logically exist in every database, ideally they should be treated same way as user objects, which are always resolved using context database collation.

This behavior change from SQL 2000 is summarized in the following table. This makes difference on SQL Server instances where user databases use different collation with server. (System databases use server collation.)

System object collation difference between SQL 2000 and SQL2005

SQL 2000

SQL2005

Schemas

Database collation

Database collation

Stored procedures, functions

Server collation

(shipped in master-db and contained in schema dbo)

Database collation
(shipped in resource-db contained in schema sys)

Functions

Keyword collation, i.e. Latin1_General_CI_AS_KS_WS

(shipped in master-db and contained in schema system_function_schema)

Database collation
(shipped in resource-db contained in schema sys)

Information-Schema views

Server collation

(shipped in master-db and contained in schema INFORMATION_SCHEMA)

Database collation
(shipped in resource-db contained in schema INFORMATION_SCHEMA)

Tables

Server collation

(shipped in master-db and contained in schema dbo)

Database collation
(changed to back-compatible views, shipped in resource-db and contained in schema sys)

Tables

Database collation

(shipped in all databases and contained in schema dbo)

Database collation
(changed to back-compatible views, shipped in resource-db and contained in schema sys)

	SQL 2000	SQL2005
Schemas	Database collation	Database collation
Stored procedures, functions	Server collation (shipped in master-db and contained in schema dbo)	Database collation (shipped in resource-db contained in schema sys)
Functions	Keyword collation, i.e. Latin1_General_CI_AS_KS_WS (shipped in master-db and contained in schema system_function_schema)	Database collation (shipped in resource-db contained in schema sys)
Information-Schema views	Server collation (shipped in master-db and contained in schema INFORMATION_SCHEMA)	Database collation (shipped in resource-db contained in schema INFORMATION_SCHEMA)
Tables	Server collation (shipped in master-db and contained in schema dbo)	Database collation (changed to back-compatible views, shipped in resource-db and contained in schema sys)
Tables	Database collation (shipped in all databases and contained in schema dbo)	Database collation (changed to back-compatible views, shipped in resource-db and contained in schema sys)

posted by andrewz | 0 Comments
Filed Under: Misc. Issues, SQL Server 2005
Attachment(s): Name Resolution.doc

Monday, April 03, 2006 10:39 PM

Name resolution, default schema, implicit schema Part II

Last section clarified some background, let us continue with Name Resolution Algorithm for Types and XML Schema Collections

[Algorithm]

As implied by general syntax for referencing entities, types or XML schema collections cannot be referenced across database. If schema name is unspecified, SQL server uses “sys first” algorithm as outlined below.

1. First, Look in the “sys” schema if not DDL access, then

2. Look in the active default schema, then

3. Look in “dbo” schema.

If schema name is specified, SQL server simply looks in the designated schema.

Note:

· Database collation is used to match schema, type and XML schema collection names.

· DDL access includes CREATE/DROP/ALTER on types or XML schema collections.

[Backward Compatibility]

During upgrade, system types are put in “sys” schema, user defined types are put in “dbo” schema. In SQL 2000, user defined types can only be created thru sp_addtype. This procedure is modified in SQL2005 to always put newly created types in “dbo” schema. Since types can only be referenced using single part name in SQL 2000, above algorithm clearly satisfies backward compatibility requirement.

Note that new type DDL can be used to create types in schemas other than “dbo” and “sys”. It is recommended to qualify with schema when referencing user defined types in applications, if active default schema cannot be dependent on.

XML schema collections are newly introduced in SQL2005, so there is no backward compatibility issue with it.

Next time will look at name resolution algorithm for objects.

posted by andrewz | 0 Comments
Filed Under: XML, Misc. Issues, SQL Server 2005

Monday, April 03, 2006 10:33 PM

Name resolution, default schema, implicit schema Part I

This series of blogs describe name resolution behavior of schema objects, types and XML schema collections, and how it interacts with default schema, compile plan implicit schema.

The first part will lay some background so that we are at the same page.

The following shows the general syntax for referring to an object/type/XML-Schema-collection in SQL Server:

               [server.] [database.] [schema.] entity

where:

entity is the name of the object/type/XML Schema collection.
schema is the schema containing the entity. The schema qualifier lets you refer to an entity in a schema other than default or implicit schema. Only schema entities can be qualified with schema. Schema entities are schema scoped entities including types, XML schema collections, and objects such as tables, views, procedures, functions, DML triggers, constraints, rules, defaults, etc. Non-schema objects, such as DDL triggers, event notifications, cannot be qualified with schema because they are not schema scoped.
database is the database context of the entity. This identifier is only valid for schema scoped objects but not types or XML collections.
server applies only when you are referencing to an object on linked/remote server. This is the name of the linked server containing the object. The server qualifier lets you refer to an object in a server other than your local server instance. Not all SQL statements allow you to access objects on linked/remote servers. If server part is specified, the object is resolved by delegating to the specified linked/remote server.

You can include spaces around the periods separating the parts of the reference to the entity, or omit database or schema part between the periods. If names of any part include special characters, you can use quoted identifier (BOL Identifier section).

In SQL 2000, types are owned by users and they are not schema scoped, i.e. their names are unique in the containing database. The only way to reference a given type is to use single part (entity) name.

[System Objects]

In SQL2005, System Objects are shipped with SQL Server in resource database. (See resource database CSS Material) These include views, scalar- and table-functions, stored-procedures, and extended-stored-procedures. Except Information-Schema (ANSI) views that belong to system schema “INFORMATION_SCHEMA”, all system objects belong to a new, reserved system schema “sys”. (Note: If a user database has a schema called “sys” it cannot be upgraded.)

These system objects can be visible (or resolvable) in every database, and compiled and/or executed in a database as if it were created in that database. The following are feature-definitions of the different types of System Objects.

System-Stored-Procedure – a System Object that has a name beginning with “sp_” and is either a stored-procedure or an extended-stored-procedure.
System-Extended-Stored-Procedure – a System Object that has a name beginning with “xp_” and is either an extended-stored-procedure or a stored-procedure. Unless schema ‘sys’ is qualified with or implied, it is visible only in “master” database (and thus always executes only in master).
System View – a System Object that is a view. It can be a backward compatible view, a catalog view, a dynamic management view (DMV) or an Information-Schema view. Unless schema context is implied or it is a backward compatible view, explicit schema ‘sys’ or ‘INFORMATION_SCHEMA’ must be qualified with.
System Function – a System Object that is a function (scalar or table) that “exists” (or is visible) in every database. It is recommended that the name begins with “fn_”.

Most of these system objects are migrated from SQL2000, and they were contained in schema “dbo” or “system_function_schema” (unbind able and hidden, only contains system functions) of master database. Back then, SQL Server name resolution and execution context have following special behaviors to handle system objects:

1. Resolve “anydb.dbo.sp_” to “master.dbo.sp_”, but execute in “anydb” context.

When you run either of the following two batches:

USE my_database; EXECdbo.sp_my_procedure; go
EXEC my_database.dbo.sp_my_procedure; go

SQL Server 2000 searches master.dbo before my_database.dbo, therefore finding the system procedure sp_my_procedure if it exists. This means that a fully-qualified 3-part (database, schema and object) name may not literally resolve to what it says it should, rather a system object from master database could be resolved instead and evaluated in the context of the specified database. In other words, user procedures (if resolved) are always compiled and executed in the database where they exist, but system procedures are compiled and executed in every database. Clearly, we are hijacking user’s namespace in this scenario, i.e. if a user created procedure, which is contained in schema “dbo”, has the same name as system procedure, the user will not be able to execute this procedure.

Important Note: In above example, SQL 2000 considers dbo.sp_my_procedure in master as system procedure only if it is marked as MS-shipped. If it is not marked as MS-shipped, then the user procedure in my_database will be resolved before it. However, if the user procedure in master was resolved indeed, then it would be compiled and executed in master database context, except SQL 65 system tables contained in it, which are compiled in my_database context. This behavior was for backward compatible. It is still preserved in SQL2005 for TSQL procedures (not CLR procedures) but will be removed in some future releases.

2. System table-valued functions must be invoked with special “::” prefix, system scalar-valued functions must be invoked without any schema.

In SQL 2000, system functions were contained in hidden schema “system_function_schema”, which cannot be referenced. User defined functions must be qualified with schema. As a consequence, this rule makes system functions unambiguously invoked. In SQL2005, the “::” syntax is being deprecated, rather it is recommended to qualify with schema “sys”. Compatibility will be maintained only for the System Functions shipped in SQL2000 by keeping an internal table to map the old syntax to the replacement system function.

3. Information-Schema views must be referenced using schema “INFORMATION_SCHEMA’, and they exist in master database but are compiled in every database.

4. All other MS-shipped “system” objects in master database, including extended procedures, user defined functions, system tables, have to be invoked by specifying master database, and thus always invoked in master database context.

Some important Terminologies --

[User’s Default Schema] In SQL 2000, there is an implicit relationship between users and schemas. Users (actually all database principals) by default are owners of a schema that is of the same name as the user (database principal). This implicit relationship is explicitly persisted upon upgrade from SQL 2000. SQL2005 enables an association of a user with a default schema that is not of the same name as the user, or multiple users to the same default schema. If a user’s default schema is not specified while creating the user, dbo is the default schema.

[Execute-As Context] A procedure/function can be executed with the privileges of its owner or with the privileges of the current user (caller), or with the privileges of arbitrary user, depending on the procedure/function definition.

Standalone Execute-As/SETUSER can be executed in current session or inside procedure/function.
[The Current Login/User] The session login/user is either the logged-in login/user or the login/user associated with the remote linked/remote-server session. Without invoke any Execute-As context, either thru procedure/function or standalone, the current login/user is the session login/user.

On database context switch, the user registered in the new database that is mapped to current login is the new current user. If no user is registered for the login, guest user is used.

On entering an Execute-As context (except execute as caller), the specified login/user becomes the current login/user. On exiting an Execute-As context, the current login/user reverts to the previous current login/user, that is, the current login/user that triggered the Execute-As context switch.

[Active Default Schema] Active default schema is schema of the SQL module if entity referenced by a statement (except DDL, dynamic SQL, a.k.a. EXECUTE statements, or intrinsic such as object_id()) inside a procedure, function or view. Otherwise, active default schema is current user’s default schema.

Let us continue with name resolution algorithm next time.

posted by andrewz | 0 Comments
Filed Under: Misc. Issues, SQL Server 2005

Monday, April 03, 2006 9:08 PM

Server Side Error Handling - Part 2 (Errors and Error Messages)

Each error in the server has two parts - an error message that describes the error that happened in the server and an error action that determines the effect of the error on the statement, level, batch, transaction, connection and/or server.

Server has instances of different combinations of errors and error messages. The major ones are:

One error, one error message - This is the most common case and most intuitive.
One error, two or more error messages - When an error is raised in a some special context, the first message has information about the error, while subsequent messages provides information about the context. This becomes an issue inside tsql try-catch. In the absence of any notion about error collection, the catch block is activated with one of the error messages. As sql server 2000 would have set @@error to the last error message, we decided to set error intrinsics (error_message(), etc.) to the last error message inside the catch block. In other words sql server 2005 would ignore all but the last error message inside tsql try-catch. Many systems would merge these error messages into one. One way to merge two error messages would be to provide the first message as a payload of the context message. Backward compatibility prevents us from mergeing old error messages. Care would be taken to not have multiple error messages for one error in future releases.
Multiple errors, multiple error messages - In this situation the server displays as many information as possible before stopping execution. This is mostly useful in development when one wants to see all compile errors or permission failures before the execution is terminated. Inside tsql try-catch the ideal behavior would be to stop execution on first error and handle it in the catch block. This is what we have tried in sql server 2005. Note this is different from @@error behavior where it would be set to the last error.
No error, one or more error messages - This happens when server recovers from non-severe errors and continues with exectuion. In reality these are informational or warning messages. They should have been sent with lower severity (10 to be precise), but can not be done because of backward compatibility. As clients do not expect to see any error messages inside tsql try-catch, they are silently ignored inside a try-block. The execution happens as if nothing has happened.

Once an error happens, server determines the action to take. Statement abort will abort the current statement. Level abort will terminate an executing proc, and will be visible as a statement abort in the scope of the caller. Batch aborts will terminate the batch. Transaction abort will abort the batch and rollback the transaction. Severe errors would kill the connection, while extremely severe error will shutdown the server.

Currently, the error action depends on the error message, the severity with which it is raised and the context in which it is raised. For tsql developer, it is very difficult to determine the error action for any error during development. Going forward we may enforce a rule where the error action just depends on the error message. To ensure that the error handling does not break on upgrades, this can only be done for new error messages.

In next post I will continue the discussion on certain error messages that are still relayed to clients from inside tsql tr-catch.

Thanks

posted by NaveenP | 0 Comments
Filed Under: SQL Server 2005, SQL Exception Handling

Thursday, March 30, 2006 7:17 PM

Server Side Error Handling - Part 1 (Migrating from @@error to tsql try-catch)

Tsql try-catch was added to improve server side error handling in sql server 2005. This feature should have been part of early T-Sql. Better late than never.

In absence of tsql try-catch, server-side error handling was done using @@error. This had necessitated statement abort as the error policy. Under this policy the execution of a batch/procedure continues after full or partial rollback of the tsql statement that raised an error.

Command:

print 'test @@error'
select 4/0
if (@@error = 8134)
print 'Error Encountered'
print 'Execution Continues'

Output:

test @@error

-----------
Msg 8134, Level 16, State 1, Line 2
Divide by zero error encountered.

Error Encountered
Execution Continues

There are two drawbacks:

'Statement Abort' as an error policy is rarely found in a procedural language. This is counter-intuitive.
Clients do not expect to see any error message if an error is handled at the server.

The above batch can be replaced using tsql try-catch.

Command:

begin Try
      print ' test tsql try-catch'
      select 4/0
      print 'Execution Halts'
end try
begin catch
      print ' error caught'
end catch

Output:

test tsql try-catch

-----------

(0 row(s) affected)

error caught

There is no 'Statement Abort' inside a tsql try-catch. Error is handled at the server and no error message is relayed to the client. The execution inside tsql try-catch stops on an error and the catch block is activated.

For backwad compatibility we still have 'statement abort' as an error policy. Moving forward there is a chance that it might be deprecated.

In next post, I will discuss why some errors are not caught (either sent to the client or ignored silently) by tsql try-catch.

Thanks

posted by NaveenP | 0 Comments
Filed Under: SQL Server 2005, SQL Exception Handling

Thursday, March 30, 2006 10:41 AM

Host Policy level Permission Sets and their permissions

While there is guidance on how to use the various permission sets - SAFE, EXTERNAL_ACCESS, UNSAFE , a list of the various permissions granted in these three helps clear their definitions.

SAFE:
Code that has this permission set is only allowed to execute and use context connections.

EXTERNAL_ACCESS:
Code that has been granted external_access can access system resources and go over the wire. Code can do distributed transactions, ping, make connections, request DNS information, read internet resources, read and write key containers and certificates, read and write environment variables, files, folders and registry keys and register and respond to events. Code is not only allowed to execute but also assert for callers to bear adequate permissions and impersonate.

UNSAFE:
In addition to all of the above, it allows calling unmanaged code.

posted by RaviR | 0 Comments
Filed Under: SQL CLR Hosting, SQL Server 2005

Wednesday, March 29, 2006 6:21 PM

Multiplication and Division with Numerics

It can be surprising to see certain results when doing numeric arithmetic:

declare @num1 numeric(38,10)
declare @num2 numeric(38,10)
set @num1 = .0000006
set @num2 = 1.0
select cast( @num1 * @num2 as numeric(38,10))

Yields: .0000010000

Instead of: .0000006000

Why?

Well, Books Online (see Precision, Scale, and Length) dictates the following rule for numeric arithmetics:

Operation	Result precision	Result scale *
e1 + e2	max(s1, s2) + max(p1-s1, p2-s2) + 1	max(s1, s2)
e1 - e2	max(s1, s2) + max(p1-s1, p2-s2) + 1	max(s1, s2)
e1 * e2	p1 + p2 + 1	s1 + s2
e1 / e2	p1 - s1 + s2 + max(6, s1 + p2 + 1)	max(6, s1 + p2 + 1)
e1 { UNION \| EXCEPT \| INTERSECT } e2	max(s1, s2) + max(p1-s1, p2-s2)	max(s1, s2)

In this case, the multiplication result precision and scale are computed as follows:

Precision = P1 + P2 + 1 = 38 + 38 + 1 = 77

Scale = S1 + S2 = 10 + 10 = 20

So conceptually, the result should be numeric(77, 20), which isn't allowed. This is where the fine prints come in:

* The result precision and scale have an absolute maximum of 38. When a result precision is greater than 38, the corresponding scale is reduced to prevent the integral part of a result from being truncated.

Books Online currently doesn't go into the specifics on how such truncation is performed.

Since the precision is higher than 38, we try to avoid truncating the integral part of the value by reducing the scale (thus truncating the decimal part of the value instead). How much scale should be sacrificed? There is no right answer. If we preserve too much, and the result of the multiplication of large numbers will be way off. If we preserve too little, multiplication of small numbers becomes an issue.

In SQL Server 2005 RTM (and previous versions), we decided preserve a minimum scale of 6 in both multiplication and division. So our numeric(77,20) is truncated as numeric(38,6), and then is then casted as numeric(38,10). However at this point it is too late, and some data has been lost. This explains the result you would see in the previous expression.

Because of this, it is important to try to minimally quantify the precision and scale of the operands involved in multiplication and division. In this case:

declare @num1 numeric(18,10)
declare @num2 numeric(18,10)
set @num1 = .0000006
set @num2 = 1.0
select cast( @num1 * @num2 as numeric(38,10))

The resulting type would be numeric(37,20). Since this type's precision and scale does not exceed our current limits, no implicit truncation is performed. We then cast the result as numeric(38,10), which doesn't cause data loss in our case.

If you cannot accurately type the values involved in multiplication and division, for example if they are procedure parameters that are called with wildly different values, you may want to look into approximate numeric types (float, real), or perhaps define your own very high capacity user defined exact numeric data type using the CLR.

I hope this was useful. Let us know if there are particular topics you'd like to hear about!

-Mat

posted by mathh | 0 Comments
Filed Under: SQL Server 2005, SQL Server 2000, Native Types and Methods

Thursday, March 23, 2006 5:41 PM

Newsequentialid (Histrory/Benefits and Implementation)

In general, we made significant improvements in SQL Server scalability during Yukon. One of the areas of improvement is replication scalability. While doing merge replication testing we found out that scaling was severely affected by high number of I/O operations. The cause of the problem was that new rows were inserted in random disk pages. Guid generating function (newid) was returning non-sequential guids which resulted in random B-tree lookups. After some investigation we figured out that we could use the new OS function UuidCreateSequential with some byte scrambling to convince the rest of SQL engine that guids are produced in sequential order.

Therefore, we implemented the new intrinsic function, Newsequentialid which is nothing more than a wrapper over the OS function UuidCreateSequential. Therefore all of the semantics of UuidCreateSequential are present in newsequentialid intrinsic. The official documentation for UuidCreateSequential is here.

The conclusion: If you are generating guids as your row identifiers, you may be interested in newsequentialid function. For example, after replication folks started using this function they were able to completely fill their data and index pages. Also, the row inserts did not require searches through the B-trees because the last modified page was already in memory.

You should be aware of all of the limitations of UuidCreateSequential before you start using newsequentialid. For example, computers without network card will return only locally unique values and the IDs returned can be used to discover some network properties of your server.

Additional notes:

1) While testing newsequentialid in highly concurrent environments, we found that on AMD64 machines the OS function may return same values. Therefore, we put the global mutex so that only one user can be going through newsequentialid at the time.

2) The old guid generating function NewId is also a wrapper over an OS function CoCreateGuid, therefore all of the semantics of CoCreateGuid exist in NewId.

3) The only additional work that we do in newsequentialid is that we take guids and arrange bytes such that values work well with existing engine guid compare algorithm. Since input bits are unique and we only move the bytes returned by the OS, the output is also guaranteed to be unique.

4) Because two consecutive calls to this function are guaranteed to return different results, the function is marked as non-deterministic. Therefore, we opted to allow the use of this function only as default value. The reason for this decision is because algebrizer and optimizer are limited in what they can do with non-deterministic functions.

posted by dragant | 0 Comments
Filed Under: SQL Server 2005, Native Types and Methods

SQL Programmability & API Development Team Blog

This Blog

Syndication

Search

Archives

News

Navigation

Post Categories

Interesting SQL Blogs

SQL Server XML Whitepapers

Publications on the design of XML support in SQL Server

SQL Server CLR Integration Whitepapers

XML processing with XQuery (part 1)

Insignificant whitespace and whitespace entitization

An article on FOR XML in Microsoft SQL Server 2000 and 2005

Strong named assemblies and AllowPartiallyTrustedCallers

Server Side Error Handling - Part 3 (Why do I still see error messages from inside tsql try-block?)

Sorting undefined characters in Unicode and/or Windows collation

SQL Server Identity column FAQs

Name resolution, default schema, implicit schema Part IV

Name resolution, default schema, implicit schema Part III

[Avoiding Luring Attack]

[Backward Compatibility]

As we can tell from the algorithm, the special behavior of SQL 2000 name resolution with respect to system objects is preserved. There is subtle difference between SQL2005 and SQL 2000 with respect to collation used to match names though.

Name resolution, default schema, implicit schema Part II

Name resolution, default schema, implicit schema Part I

This series of blogs describe name resolution behavior of schema objects, types and XML schema collections, and how it interacts with default schema, compile plan implicit schema.

The first part will lay some background so that we are at the same page.

1. Resolve “anydb.dbo.sp_” to “master.dbo.sp_”, but execute in “anydb” context.

2. System table-valued functions must be invoked with special “::” prefix, system scalar-valued functions must be invoked without any schema.

3. Information-Schema views must be referenced using schema “INFORMATION_SCHEMA’, and they exist in master database but are compiled in every database.

4. All other MS-shipped “system” objects in master database, including extended procedures, user defined functions, system tables, have to be invoked by specifying master database, and thus always invoked in master database context.

[Execute-As Context] A procedure/function can be executed with the privileges of its owner or with the privileges of the current user (caller), or with the privileges of arbitrary user, depending on the procedure/function definition.

Server Side Error Handling - Part 2 (Errors and Error Messages)

Server Side Error Handling - Part 1 (Migrating from @@error to tsql try-catch)

Host Policy level Permission Sets and their permissions

Multiplication and Division with Numerics

Newsequentialid (Histrory/Benefits and Implementation)