Just geeks: 2014

Wednesday, December 17, 2014

Query SQL Server from Powershell using ADO.NET objects

If you want to execute a query against a SQL Server database using Powershell there are a couple of options to consider.

1. SQL Server Invoke-Sqlcmd cmdlet

You can always use the Invoke-Sqlcmd cmdlet, but you will need to have the SQL Server client tools installed, get some paths correct, and add the cmdlet so your script can access it.

2. Use ADO.NET objects.

Since ADO.NET has been part of .NET since the beginning and Powershell can access .NET objects, no additional objects are needed to execute a database query.

Here is a function for reuse.

function Invoke-SQL {

param(

[string] $dataSource = ".\SQLEXPRESS",

[string] $username = $(throw "Please specify a username."),

[string] $password = $(throw "Please specify a password."),

[string] $database = "master",

[string] $sqlCommand = $(throw "Please specify a query.")

)

$connectionString = "Data Source=$dataSource; " +

"User Id=$username; Password=$password; " +

"Initial Catalog=$database"

$connection = new-object system.data.SqlClient.SQLConnection($connectionString)

$command = new-object system.data.sqlclient.sqlcommand($sqlCommand,$connection)

$connection.Open()

$adapter = New-Object System.Data.sqlclient.sqlDataAdapter $command

$dataset = New-Object System.Data.DataSet

$adapter.Fill($dataSet) | Out-Null

$connection.Close()

$dataSet.Tables

}

Here is how you call it.

Invoke-SQL "serverHere" "userHere" "pwdHere" "dbHere" "SELECT GETDATE() AS TimeOfQuery;"

NOTE: The basis of this script is from here. I did extend it a little bit to work with named users instead of using integrated security.

Friday, November 28, 2014

Pig Basics

What is Pig

Pig is a program that reads files written in PigLatin to generate a MapReduce program automatically. It is generally much easier to use Pig than writing your own MapReduce programs. Grunt is the interactive shell for Pig.

Ways to start Pig

Pig in interactive local mode

Runs in single virtual machine
All files are in the local file system
>pig -x local
grunt>

Pig in Interactive MapReduce mode

Runs in a Hadoop cluster
Is the default mode
>pig -x
grunt>

Pig executing Pig script file in local mode

Script is written in PigLatin
>pig -x local myscript.pig

Pig executing Pig script file in MapReduce mode

Script is written in PigLatin
>pig myscript.pig

Pig Data

Tuple - like a row in a file, or a row in a table, but not all don't have to have the same number of items. example: They can contain scalar types such as int, chararray, double, etc, or even bags. (Jeff, {apple, orange, pear}). Parenthesis are used to indicate the tuple datatype
Bag - bag of tuples. Curly braces are used to indicate the bag datatype.
Relation - an outer bag. Generally it is what you get back when you filter, group, sort, join, etc data. In terms of a database, it is kind of like a view or result set.

Data Type	Description	Example
int	signed 32-bit integer	300
long	signed 64-bit integer	300L or 300l
float	32-bit floating point	3.2F, 3.2f, 3.2e2f, 3.2E2F
double	64-bit floating point	3.2, 3.2e2, 3.2E2
chararray	a string	abcde Basically a string
bytearray	a blob
tuple	ordered set of fields	(4, Brent, 388.25) Kind of like a row in a database
bag	collection of tuples	{(4, Brent, 388.25) (20, Amanda,36.7)} kind of like multiple rows in a database. Could also be thought of as an array, list, collection, etc
map	set of key value pairs

Pig Latin Basics

Terminate with a semi-colon
/*...*/ commend block
-- single line comment
Names of relations and fields are case sensitive
Function names are case sensitive
keywords such as LOAD, USING, AS, GROUP, BY, etc are NOT case sensitive

Loading Data

A = load '/datadir/datafile' using PigStorage('\t');
NOTE: tab is the default delimiter

NOTE: If the path to a file is a directory then all files in the directory will be loaded

The default is PigStorage, but there is also BinStorage, TextLoader, JsonLoader, and you can code your own loader as well.

You can also define your schema so you can refer to fields by name (f1,f2, f3, etc)

A = load '/datadir/datafile' using PigStorage('\t') as (f1:int, f2:chararray, f3:float);

If you don't specify the schema you need to use the position. For example, $0 is the first position.

OUTPUT

Opposite of load.
Formats: PigStorage(), BinStorage(), PigDump(), JsonStorage()
DUMP writes the results to the screen.

Operators

Arthmetic: +-/*%?
Boolean: and, or, not
Comparison: ==, !=, <, >, is null, is not null

Parameters

Parameters can be passed into a pig script via a parameter file or the command line.
Parameters are referenced using $

Relational Operators

FILTER

Selects tuples from a relation based on some criteria
b = filter data by pubyear == 2014

ORDER BY

Sorts a relation on one or more fields
b = order data by author ASC;

FOREACH

Projects fields into a new relation. Under the hood this just is a foreach loop that loops through each of the elements in the data. For example, if you want to only return a subset of the fields. A calculation can also be done here. For example, algebra between fields.

GROUP

Groups together tuples that have the same group key; the group key can be a single field or multiple fields (enclose multiple fields with parentheses). The result of a Group is a relation that contains one tuple per group. The tuple has two fields (group and value (a bag with one tuple in it)).

Example:
data =
(1,2,3)
(4,5,6)
(7,8,9)
(4,3,2)
myGroup = group data by f1;

Result:
(1,{(1,2,3)})
(4,{(4,5,6),(4,3,2)})
(7,{(7,8,9)})

COGROUP

Same as GROUP operator, but by convention used when grouping multiple (up to 127) relations at the same time. Similar results to GROUP except resulting tuple has 1 group field and then one field for each relation we are cogrouping by. So if, we are cogrouping using two relationships then each resulting tuple would be (group, value for relation1, value for relation2) where relation1 and relation2 would be bags of tuples just like with the GROUP operator.

Dereference

Allows us to reference a field in a tuple or bag that is outside the scope of the current operator. This can be used with the FOREACH operator.

DISTINCT

Removes duplicate tuples found in a relation

UNION

Merges the contents of two or relations.
NOTE: The relations do NOT have to have the same number of fields in them like you would in SQL.

SPLIT

Partitions a relation into two or more relations based on some condition

CROSS

Computes the cross product of two or more relations

JOIN / INNER

Performs a join (equijoin) on two or more relations using one or more common field values. Like a SQL join.

JOIN / OUTER (full, right, left)

Performs a join on two or more relations using one or more common fields. Works like you would expect if you are familiar with SQL outer joins.

Evaluation Functions

Requires GROUP ALL or GROUP BY

COUNT - Counts the number of elements in a bag
COUNT_STAR - Computes the number of elements in a bag
MAX - Computes the maximum value in a single-column bag
MIN - Computes the minimum value in a single-column ba
SUM - Computes the sum of the numeric values in a single-column bag
AVG - Computes the average of the number values in a single-column bag

Do NOT require GROUP ALL or GROUP BY

CONCAT - Concatenates two columns
DIFF - Compares two fields in a tuple
IsEmpty - Checks if a bag or map is empty
SIZE - Computes the number of elements based on any Pig data type
TOKENIZE - splits a string and outputs a bag of words

Math Functions

Based on Java Math class
Samples: ABS, CEIL, etc

String Functions

Based on Java String class
Samples: STRSPLIT, SUBSTRING, REPLACE, REGEX_EXTRACT, REGEX_EXTRACT_ALL etc

Tuple, Bag, and Map Functions

TOTUPLE - converts one or more expressions to tuple
TOBAG - converts one or more expressions to type bag
TOMAP - converts pairs of expressions into a map

External Type Operators

MAPREDUCE - Executives native MapReduce jobs inside a Pig script
STREAM - Sends data to an external script or program
REGISTER - Registers a JAR file so that the UDFs in the file can be used.
fs - invokes any FSShell command from within script or the Grunt shell
grunt > exec myscript.pig
EXPLAIN - displays the execution plan. Used to review the logical, physical, and MapReduce execution plans

References:

Big Latin Basics- great reference
Introduction to PIG at the Big Data University - nice training class for free. Nearly all the information above is from this class. In some cases copied.

Tuesday, September 30, 2014

Hadoop (HDFS) Command line basics

Once you get Hadoop installed you can open the a terminal (aka command line). There is a program called hadoop and we need to pass it different switches and arguments to make it do what we want. Most of the hadoop fs (file shell) commands behave like the corresponding UNIX commands. Below are some of the commands you may find useful.

As a general rule, all hadoop filesystem commands start with hadoop fs.

Referencing HDFS Paths
When accessing the HDFS filesystem you need to use the hostname and port associated with the name node. In the examples below, the host is bivm and the name node is running on port 9000. You can also just make a relative reference by leaving off the hdfs://bivm:9000/

For example, to copy a file from the local file system to the HDFS file system we could specify the full path with:

hadoop fs -put file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

However, we can also do

hadoop fs -put file:///home/biadmin/test.txt test.txt

This convention applies to all hadoop fs commands.

help - To get help and see all the commands for hadoop fs
hadoop fs -help

help - Get help on a fs command
hadoop fs -help du

ls - Show the files in the current user's directory

hadoop fs -ls
or
hadoop fs -ls /user/biadmin(assuming the user name is biadmin)

ls - Show the files in the user directory

hadoop fs -ls /user

cp - Copy a file from local file system to HDFS
hadoop fs -cp file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

put or copyFromLocal - Copy files from the local filesystem (the opposite of copyToLocal)
hadoop fs -put file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt
hadoop fs -copyFromLocal file:///home/biadmin/test.txt hdfs://bivm:9000/user/biadmin/test.txt

get or copyToLocal or get - Copies files from HDFS to the local filesystem (the opposite of copyFromLocal)
hadoop fs -copyToLocal hdfs://bivm:9000/user/biadmin/test.txt file:///home/biadmin/test.txt
hadoop fs -get hdfs://bivm:9000/user/biadmin/test.txt file:///home/biadmin/test.txt

tail - View the last few line of a file
hadoop fs -tail hdfs://bivm:9000/user/biadmin/test.txt

cat -View the entire contents of a file
hadoop fs -cat hdfs://bivm:9000/user/biadmin/test.txt

rm - remove a file
hadoop fs -rm hdfs://bivm:9000/user/biadmin/test.txt

du - find the size of a file
hadoop fs -du hdfs://bivm:9000/user/biadmin/test.txt

du - get the size of all files in a directory
hadoop fs -du hdfs://bivm:9000/user/biadmin

du - get the total size of all files in a directory
hadoop fs -du -s hdfs://bivm:9000/user/biadmin

mkdir - make a new directory
hadoop fs -mkdir hdfs://bivm:9000/user/biadmin/test

Other Unix-like HDFS Commands

chgrp
chmod
chown
mkdir
mv
rm
stat
setRep - Sets the replication factor of a file or Sets the replication factor of a entire tree
getMerge - Gets all files in the directories that match the source pattern and also merges and sorts them into only one file on the local filesystem

Piping
You can pipe the results of these commands to unix commands. For example, we can grep the result of the ls command.

hadoop fs -ls | grep test

The result would be something like:
-rw-r--r-- 1 biadmin biadmin 5 2014-09-24 00:48 test.txt

Saturday, September 20, 2014

Big Data Concepts

Why Big Data

Big Data conceptually has been around every since we had data. The concept is how do store and process data on hardware that is smaller than the data itself. Big Data as the name implies is needed when dealing with very large amounts of data. Big is relative, but generally we are dealing with terabytes, petabytes, exabytes, etc. However, there is no real threshold for using Big Data. Consider that a person's DNA sequence is only about 800MB, but it contains 4 billion pieces of information and has lots of patterns in it. The problem is that processing is slow using conventional databases. Big data would still be a good candidate for this because of the complexity of the data and processing power needed to analyze it. It is great for unstructured data, but can be used with structured data as well.

In short the amount of data being generated is growing exponentially and most of that data is unstructured or semi-structured. To process that data we generally need more power and storage than a single database, server, etc can handle.

The 3 V's of Big Data

Velocity - how fast is data being produced
Volume - how much data is being produced
Variety - how different is the data

What is Hadoop

Apache, the creators of Hadoop say
"The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures."

I would add some not generalizations:

You can kind of think of it as a software RAID in that it spreads data among different disks. Instead of hardware controller there is a dedicated name server that does the coordination between computers (instead of disks on one machine as with RAID). The big advantage of this is we are no longer bound to how much we can fit into one server and the processing power of one server and the IO of a hard disk because all requests are in parallel.
It is more than a place to store files, though one part of it is actually HDFS which the distributed file system. It includes an ever growing collection of tools to process the data.
It is a self healing technology such that if one computer or rack of computers goes down then it will detect this failure and use the other copies that are on other computers. Assuming there is available space available somewhere it will rebuild the data to reduce the risk if another server(s) goes down.
The great thing is that when we run out of space (and in big data you will by the nature of the domain) that we can add more computers to the configuration and tell Hadoop to re-balance and it will move data around to make use of the new space.

The core of many Big Data systems

Open source project by The Apache Software Foundation
Written in Java
Great Performance
Reliability provided by replication of data between computers

Optimized to handle

Massive amounts of data through parallelism
A variety of data (unstructured, semi-structured, and structured)
Inexpensive commodity hardware

Projects Associated with Hadoop

Eclipse is a popular IDE donated by IBM to the open source community.
Lucene is a text search engine library written in Java.
Hbase is the Hadoop database.
Hive provides data warehousing tools to extract, transform and load
data, and then, query this data stored in Hadoop files.
Pig is a high level language that generates MapReduce code to analyze
large data sets.
Jaql is a query language for JavaScript open notation.
ZooKeeper is a centralized configuration service and naming registry for
large distributed systems.
Avro is a data serialization system.
UIMA is the architecture for the development, discovery, composition
and deployment for the analysis of unstructured data.

What it is NOT good for

Not designed for OLTP, OLAP. It is not a replacement for RDBMS
Not designed for random access such is the case with RDBMS
Not good for processing lots of little files, but vendors are working to make this work better.
Not good for low latency data access
Not good for work that must be sequential or cannot be parallelized
Not good for complex calculations with little data.

Typical Sources for Big Data

RFID Readers
Shopping / Transactions
Mobile Devices
Internet users
Twitter
Sensor data

Friday, August 15, 2014

Free Convert for Excel files (.xls to .xlsx)

I was pleasantly surprised that Excel 2010 has a command line tool that will convert .xls to .xlsx files. Here is how you use it.

In this example we are converting a file called MyFile.xls in the temp directory to MyFle.xlsx in the same temp directory. I didn't have any luck getting this to work without using absolute path.

Open a command prompt
Execute the following (after adjusting it to meet your specific environment:
"C:\Program Files\Microsoft Office\Office14\excelcnv.exe" -oice "C:\temp\MyFile.xls" "C:\temp\MyFile.xlsx"

Since this is command line driven you can script many files easily.

@ECHO OFF
FOR %%X in (".\*.xls") DO IF NOT %%~xX == .xlsx echo Converting "%%~dpnxX" & "C:\Program Files\Microsoft Office\Office14\excelcnv.exe" -nme -oice "%%~dpnxX" "%%~dpnX.xlsx"

WARNING: For some reason Excel may tell you that it didn't launch correctly and ask you if you want to run it in safemode. I have no clue as to why this is, but clicking No each time seems to work fine. It will do this for each file.

Friday, July 18, 2014

Remove alpha characters from string using SQL

If you have a string that has both numbers and alpha characters in it and want to remove all letters A-Z then this is a simple function that you can use on a column in SQL Server.

create function RemoveAlphas(@Text as nvarchar(255))
returns nvarchar(255)
as
BEGIN
Declare @Result as nvarchar(255)
Set @Result = @Text
Set @Result = Replace(@Result, 'A', '')
Set @Result = Replace(@Result, 'B', '')
Set @Result = Replace(@Result, 'C', '')
Set @Result = Replace(@Result, 'D', '')
Set @Result = Replace(@Result, 'E', '')
Set @Result = Replace(@Result, 'F', '')
Set @Result = Replace(@Result, 'G', '')
Set @Result = Replace(@Result, 'H', '')
Set @Result = Replace(@Result, 'I', '')
Set @Result = Replace(@Result, 'J', '')
Set @Result = Replace(@Result, 'K', '')
Set @Result = Replace(@Result, 'L', '')
Set @Result = Replace(@Result, 'M', '')
Set @Result = Replace(@Result, 'N', '')
Set @Result = Replace(@Result, 'O', '')
Set @Result = Replace(@Result, 'P', '')
Set @Result = Replace(@Result, 'Q', '')
Set @Result = Replace(@Result, 'R', '')
Set @Result = Replace(@Result, 'S', '')
Set @Result = Replace(@Result, 'T', '')
Set @Result = Replace(@Result, 'U', '')
Set @Result = Replace(@Result, 'V', '')
Set @Result = Replace(@Result, 'W', '')
Set @Result = Replace(@Result, 'X', '')
Set @Result = Replace(@Result, 'Y', '')
Set @Result = Replace(@Result, 'Z', '')
return @Result
END

Usage:

select dbo.RemoveAlphas('123abc456DEF')

Returns 123456

Wednesday, July 9, 2014

Using T-SQL to format date as yyyy-mm-dd

Surprisingly, MS SQL Server doesn't provide custom date formatting. Instead you need to use one of their existing formats or use C# to implement it, but the later seems a bit overkill for our purposes here. The formats are defined here. Below are some ways to get a datetime or date column to print out in the yyyy-mm-dd format.

Method 1
This is simple and straight forward. Interestingly, I don't see it defined here, but it works.

WARNING:
Since it is not documented it is up to you if you want to use it or not. It has been around for many years, but it is unknown if it will be there in the future. Use this option at your own risk.

In this example, the length of 20 is used to show it doesn't matter, but any size could be used since it actually gives us the format we are looking for.

SELECT CONVERT(NVARCHAR(20), GETDATE(), 23)

That will give you 2014-07-09.

NOTE: If you would like explore other undocumented formats, check this page out.

Method 2

A clever way is to use the 126 format which has hours, minutes, seconds, etc in it and just take the first 10 characters which is in the format yyyy-mm-dd. We could get a substring, but there really isn't a need since it will be implicitly truncated to 10 characters when we use char(10) or varchar(10) as our datatype we are converting to.

SELECT CONVERT(char(10), GetDate(),126)

That will give you 2014-07-09.

Method 2
If you have slightly different format requirement such as slashes or
select Replace(convert(nvarchar(10), GETDATE(), 102), '.', '-')

This works very simply because 102 is defined in the format yyyy.mm.dd and I am just replacing the periods with dashes. Pretty simple and effective.

Method 3
If you want it in the format yyyymmdd (no dashes) then you can use the 112 format as shown below.

select convert(nvarchar(10), GETDATE(), 112)

That will give you 20140709.

Wednesday, July 2, 2014

Get a list of tables in SQL Server that don't have primary keys defined

In SQL Server it is a best practice for all tables to have primary keys defined. A primary key is really a constraint. Below is a query to get a list of tables and the name of the primary key contstraint that is associated with that table. If the CONSTRAINT_NAME column is null then it doesn't have a primary key defined.

select t.TABLE_SCHEMA, t.TABLE_NAME, c.CONSTRAINT_NAME
from INFORMATION_SCHEMA.TABLES t
left outer join INFORMATION_SCHEMA.TABLE_CONSTRAINTS c
on (t.TABLE_SCHEMA = c.TABLE_SCHEMA and t.TABLE_NAME = c.TABLE_NAME and t.TABLE_TYPE = 'BASE TABLE' and c.CONSTRAINT_TYPE = 'PRIMARY KEY')
order by TABLE_NAME

You can add a where clause such as

where c.CONSTRAINT_TYPE is null

to filter the results to just the tables that don't have a primary key.

Find blank rows in a SQL Server Database

After importing data into a SQL Server database there are sometimes blank rows that get created depending on what your data source looks like. Often when using Excel as a data source extra rows will be created with all blank values. Since the table doesn't by default have a primary key all columns can be null. Find what tables have blank rows and then deleting them can be time consuming. The script here will make this much easier.

Disclaimer

I have used this script successfully on my databases, but please, please, please make a backup of your database BEFORE you execute the following since it can affect all your tables. I am of course not responsible for any data loss caused by this script.

Executing the script below does NOT actually do the deletes. You will still need to copy and paste the generated SQL into SSMS and execute it. I highly suggest you read the generated SQL to make sure it is doing what you want it to before you do the final execution of the generated sql.

create table #BlankRowCounts(TableName nvarchar(255), NumBlankRows int)

Declare @SQL as nvarchar(MAX)

select

@SQL = ISNULL(@SQL + ' union ' , '')

+ 'select ' +

'''' + TABLE_NAME + ''' as TableName, ' +

'COUNT(1) as NumBlankRows' +

' from ' +

'[' + TABLE_NAME + ']' +

' where ' +

dbo.GetColumnList(TABLE_NAME, 1, ' is null and ') + ' is null'

from INFORMATION_SCHEMA.TABLES

where TABLE_TYPE = 'BASE TABLE'

ORDER BY TABLE_NAME

insert into #BlankRowCounts(TableName, NumBlankRows)

exec sp_executesql @SQL

select TableName, NumBlankRows,

'select ' +

'''' + TableName + ''' as TableName, ' +

dbo.GetColumnList(TableName, 1, ', ') +

' from ' +

'[' + TableName + ']' +

' where ' +

dbo.GetColumnList(TableName, 1, ' is null and ') + ' is null' as SelectStmt,

'delete from ' +

'[' + TableName + ']' +

' where ' +

dbo.GetColumnList(TableName, 1, ' is null and ') + ' is null' as DeleteStmt

from

#BlankRowCounts

order by NumBlankRows desc

drop table #BlankRowCounts

IMPORTANT
You will also need to get the code for GetColumnList() function here.

The results of this SQL are simple. There are four columns:

TableName - The table for which the statements will affect
NumBlankRows - The number of rows in the table (see TableName) that have all blank columns
SelectStmt - The select statement you can copy and paste into SSMS to actually see for yourself that the columns are null. You don't have to execute these, but they are here to convince yourself that the data is blank.
DeleteStmt - The delete statement you can copy and paste into SSMS to actually do the deleting of the rows that have all the columns as null. I highly recommend reading this BEFORE you execute it. Also, consider backing up your data if the data is important to you.

Get Comma Separated List of Columns for a Table using T-SQL

Using T-SQL (Microsoft SQL Server) you can get a list of columns (delimited by commas or other delimiter of choice) for a given table using one of the function below. There are two ways to call it. Either one words, but it is up to your personal preferences and also how safe the column names are as to which function you use.

If ever in doubt, pass a 1 for @IncludeBrackets is the safest because it puts all the column names in square brackets. This allows column names to have spaces and other special characters that would not normally be allowed. This often happens when importing data from Excel and using the default names for the columns. This is because it uses the column headings in Excel which typically have spaces in them because they are meant to be human readable.

With that said, if I created the columns I always use Just alphanumeric characters and no spaces, etc so my column names are known to be safe. In this scenario, I personally feel it is easier to read the column names without the brackets so in this case I pass 0 for the @IncludeBracket parameter.

The first parameter is simply the table name.

Basic Usage

To use the function on a table called Person do the following.

select dbo.GetColumnList('Person', 1, ', ')
sample results: [FirstName], [LastName], [Phone]
or
select dbo.GetColumnList('Person', 0, ', ')

sample results: FirstName, LastName, Phone

Function Definition (Code)

Here is the code to create the SQL function

create function GetColumnList(@TableName as nvarchar(255), @IncludeBrackets as bit, @Delimiter as nvarchar(500))
returns nvarchar(max)
as
BEGIN
Declare @ColumnList as nvarchar(MAX)
Declare @BeginningBracket as nvarchar(1)
Declare @EndingBracket as nvarchar(1)

if @IncludeBrackets = 1
BEGIN
SET @BeginningBracket = '['
SET @EndingBracket = ']'
END
else
BEGIN
SET @BeginningBracket = ''
SET @EndingBracket = ''
END

select
@ColumnList = ISNULL(@ColumnList + @Delimiter, '')
+ @BeginningBracket + COLUMN_NAME + @EndingBracket
from INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME = @TableName
ORDER BY COLUMN_NAME

return @ColumnList
END

Advances Usage

If you want to do the same thing, but for all tables in your database you can do something like the following. You can of course add a where clause, etc to select just the sames you want as well.

select
dbo.GetColumnList(TABLE_NAME, 1, ', ')
from INFORMATION_SCHEMA.TABLES

Wednesday, April 16, 2014

FREE or Open Source or Inexpensive options to compare data (in tables) in SQL Server

Intro

Visual Studio 2012 (an some earlier editions) include Schema comparison for SQL Server. This is NOT what I am writing about here today. Assuming you have used this tool to make tables, etc the same you may have a need like I do when moving data from dev to production databases or the reverse. There are several ways to go about this.

FREE - SQL Server Data Tools

Probably the best place to start is SQL Server Data Tools which is available from Microsoft. It includes among other things the ability to compare data in SQL Server tables.It is available for Visual Studio 2010 and newer. There was an option in some editions of Visual Studio 2010, but not in Visual Studio 2012. To get the functionality in Visual Studio 2012 you need the SQL Server Data Tools to be installed. Once you have it installed you will have functionality very similar to what was available in VS 2010 or the RedGate product. Here is a direct link for the download of the English ISO. One of the nice things about this option is that it is well integrated into Visual Studio 2012 and uses the same source and destination configurations as the SQL Schema Comparison that is built into VS 2012.You can also select what tables you want to compare, what columns in the tables, if you want source or destination records, etc. It will just to the update for you or you can have it generate the SQL Script that you can manually. It gives you a nice visual representation of the differences and let's you select the rows you want to change. It seems to be pretty fast. The generated SQL script even disables constraints as needed. It also seems to handle nulls properly. This is a very nice option for free!

It appears it can be called from the command line as well, but I have not tried it.

Here is the blog for the SSDT team.

FREE - tablediff.exe

IMHO, this may be the best choice for scripted options. Believe it or not tablediff.exe is a utility that comes with SQL Sever 2005 or greater. I believe this is the tool that SQL Server uses when replicating tables, though that is just what I read from someone else. It will tell you on a row by row and column by column basis what is different. It will even generate the SQL scripts needed to make the destination table look like the source table. As far as I know you cannot download it separately. However, it is installed when you install SQL Sever 2005 or newer and you choose SQL Server Replication feature. In SQL Server 2008 R2 it is included by default, but I'm not sure about the other versions. On my machine it was located at C:\Program Files\Microsoft SQL Server\100\COM\tablediff.exe. Once you find it you can type tablediff.exe -? for the options or refer here to the documentation. The parameters are pretty well documented and easy to follow to I won't go into all the options, but here is an example of how you would generate a change script (SQL) and see what the differences are.

C:\Program Files\Microsoft SQL Server\100\COM>tablediff.exe -sourceserver MySrcServer-sourcedatabase MyDevDB -sourceschema dbo -sourcetable Person -sourceuser User1 -sourcepassword User1Pwd -destinationserver MyDestServer -destinationdatabase MyProdDB -destinat

ionschema dbo -destinationtable Person -destinationuser User1 -destinationpassword User1Pwd -c -o c:\temp\diff.txt -f c:\temp\diffgen.sql

This will generate two files. diff.txt which will have the differences, and diffgen.sql which will be the SQL script you can execute to make the destination table the same as the source table.

WARNING:

Be careful, the SQL will also generate delete statements for your destination table. This may or may not be what you want so just be aware. I recommend backing up your destination table before doing this operation.

Also, it doesn't appear to generate correct scripts for null. It put null in single-quotes. This can be changed easily with a search and replace though.

Keep in mind this is per table. If you have lots of tables and you want them all to be updated it could be a done also, but it a bit tedious. However, the nice thing about this tool is since it is command line once you have it setup you can run it again and again with little to no effort.

If you want you can even download a GUI for the command line tool. NOTE: You still need the command line tool. I was not able to get it to work with SQL Server 2008 R2, but you can decompile it and fix it I'm sure. Does it work for anyone else? All it does is exactly what you do on the command line so I'm not sure I really see the benefit if you are going to want to run it more than once.

Inexpensive - SQLDBTools

If I have a little money to spend on the solution. SQLDBTools costs $60 at seems to do pretty much what the RedGate products do, but for a fraction of the cost. It does do schema and data comparisons which is nice to have it all in one tool. I does visually let you see the differences and it also generates the change script. I have not tried this product, but it looks like the best product for the money (if you are going to pay for a solution). Given that SQL Server Data Tools now exists I don't really see the advantage or reason to pay for this tool.

FREE - Linked Server

This is a less desirable option because it doesn't do anything automatically for you. After you have a linked server you can do queries to see the differences between tables. This does nothing for automatically generating the update scripts. This solution is good for analysis only.

Inexpensive - Beyond Compare

Beyond Compare does a nice job of visually showing the differences in two files. It doesn't know anything about SQL or databases, but as long as you use SSMS to export the data to CSV, Excel, or tabular data Beyond Compare will show you the differences in an Excel like manner. Again, this does nothing for generating the change script and could be slow for very large tables.This is best for comparing two adhoc queries in my opinion.

Additional Info

I found this site that has a pretty exhaustive list of tools for SQL Server that may be useful.

Thursday, April 3, 2014

Capriza Review

Overview

Capriza is designed specifically for non-developers. Capriza rapidly converts, transforms, and optimizes existing web-based desktop applications into secure, lightweight mobile apps (zapps), complete with modern mobile capabilities such as GPS, camera, barcode scanning, click-to-call, etc

Key Features

Zero coding required, but can do own coding also

Zero APIs

Citrix or live-screenscraping like concept
Very fast app creation

Produces HTML5-based mobile apps (Capriza call them zapps) from standard web-based applications that were targeted to run on the desktop browser.
Mobile experience is different from desktop experience; it is generally simplified
Runs on any modern device and platform
Mobile extension kit to add custom widgets, native functionality, etc
SSO

Optimized for SAP and SalesForce.com, but can work on any web application
Mobile services such as

GPS
Camera
Bar code scanner

Click-to-call

Location services

Distribution (MDM)

Capriza Native App: available free through either the Apple App Store or Google Play; users login to see apps
Capriza HTML App: For enterprises that don't want a native app use any modern browser to access the non-native app.

Custom Enterprise URL: Centrally distribute and manage apps through a custom, corporate branded internet or intranet URL/Domain.

Enterprise App Store (EAS)for distributing apps that is corporate branded.

Homegrown EAS

Monitoring

Cloud-based, real-time management dashboard

Users

Infrastructure

Zapp Health monitoring

Analytics

Usage

Adoption

Feedback

Security

Can run their software behind the corporate firewall in 8 minutes to access apps on the intranet
If it is public the cloud can be leveraged

Platforms

iOS
Android
BlackBerry

Windows Phone

Samsung BADA

Firefox OS

Devices

Smartphones

Tablets
Desktops

Browser Support

Safari

Chrome

Android stock browser

Others

Professional Services are available

How it Works

Zapps work in a manner similar to Citrix. The Zapps are actually thin clients. These clients don't store anything locally on devices which lowers security concerns. All communication between components use HTTPS. Here is how a typical interaction would look:

User launches a zapp on their mobile device it

The zapp connects to the Capriza Relay Server (on premise or cloud) via HTTPS. It manages the communication between the zapp and the Runtime Agents.

The Capriza Relay Server starts a Runtime Agent which is a headless browser.

The Runtime Agent connects to the legacy web app and is rendered (not visible to anyone) in the Runtime Agent.

The Runtime Agent (or maybe the Runtime Agent not sure) translates the legacy web app to a mobile friendly format and sends it to the Zapp (mobile app) to be displayed to the user.

Developer Experience

Capriza Designer (Firefox Plug-in)

Walk through web application and drag parts that you are interested in to the mobile app screen to create the screens.

Screens are customizable
No programming needed

Can change layouts, etc.

Changes

When the legacy web application changes zapp will need to change also and should be part of change management.

Cost Model

Depends on how want to license

Buy Platform for unlimited use
Per User per month for smaller uses

Prices vary based on specific details of how to deploy, etc.

Conclusions

I have investigated the product. Below is a summary of what I found. In general please understand that this tool does NOT allow you to add functionality to a mobile application that is not already on the legacy web application. So, it is not a tool for doing new mobile development if there is not an existing web application that it will interact with. The presentation of the user interface is changed to be for a mobile device. This is typically much simplified to be more task specific. It could be very good for bring existing web applications to the mobile device when source code level access to the existing web application is not available. It could also be useful for prototyping changes to web applications that we do have source code level access to as well.

Pros:

• Very fast application creation when it plays nicely with the Capriza tooling

• Works on any mobile device

• Low cost of development due to time savings

• Distribution simplified

• Minor changes such as layout and cosmetics do not require the mobile application to be modified typically

Cons:

• Cannot be any faster than the existing web site and there is some overhead

• Some web apps will be work better with this tool than others

• Major changes to legacy web application affect the mobile application

Monday, March 17, 2014

How to deploy Iron Speed Designer web application

I love Iron Speed Designer (ISD) for lots of reasons. I also hate certain things about it. Today, I am going to talk about how the deployment is lacking, but how you can work around it.

My experience:

If you want pure frustration click the Deploy button or use the menu in Iron Speed Designer. This button will allow you to deploy exactly once. After that it will tell you there is an existing profile. In my current version of Iron Speed Designer (ISD), it tells me that there is no project for some reason. BTW, ISD support says this is by design. OMG. If you want to deploy to anywhere except a local directory it doesn't support that and you have to use Visual Studio anyway. Believe it or not, I get different results between publishing via this and Visual Studio.
Don't use Visual Studio to publish your project either. It seems to cause some weird problems where ISD can't build anymore and can't find stored procedures at runtime when using Active Directory or Windows Authentication authentication.

The workaround:

Just do a simple file xcopy deployment of the project directory to the server. No issues this way.

Does Iron Speed Designer (ISD) use username or domain\username in the user table?

When using Windows Authentication with Iron Speed Designer (ISD) you can use a user table to specify who can access your application. I often forget what the username column in Iron Speed Designer (ISD) is supposed to contain.

The confusion in my mind is that Windows authentication requires a domain to be specified. This may not be apparent if using IE, but with other browsers such as Firefox you will be prompted for your credentials and required to put the username in the format: domain\username. In the case of ISD, it appears that Windows authentication is still handled by IIS / Windows / Active Directory as it is in any other ASP.NET application that uses Windows Authentication. Once the user is authenticated though, ISD queries the database tables that you specify. In these queries it uses just the username portion of the domain\username you typed (or IE passed on for you). I don't think this would work very well if your network has multiple domains, but I have not confirmed this.

So, to put it simply, store just the username (no domain) in the column in the user table you specify.

Wednesday, March 5, 2014

Oracle APEX Review

Overview

Oracle APEX = Oracle Application Express

Been around since 2004 under various names

Browser Based development and deployment

Think of it as a replacement for MS Access, but on the web and multi-user with an Oracle backend.

RAD tool for the Oracle Database; think Forms over data

Declaratively build web 2.0 applications

Leverage SQL and PL/SQL skills.

Multi-tenant Hosting

Departmental Solutions is focus

The Oracle store is built with APEX

Lots of options for Authentication or even custom

Built-in support to prevent URL tampering

20 canned themes or can create your own, including one for starting mobile.

Wizards to create forms including master detail, etc, but has basic layout, functionality, etc.

Create web services with wizard. A web service can be created from a region also.

Can add regions to a page and add most anything to the region including forms, etc.

Cost

Fully supported by Oracle

Free with Oracle Database (include Oracle XE)

Skill Set Required

PL/SQL

SQL

Skill Set Required for Customization

HTML

JavaScript

CSS

Advantages

Robust migration path for Oracle Forms application to APEX.

Very easy to generate data-driven CRUD data entry style applications that include simple reporting.

Migrating MS Access, or Excel files to a multi-user web environment

Similar to SharePoint lists such that you can create tables, and UI based on an Excel spreadsheet.

Free with Oracle database license

RAD or demo or POC

Built-in themes

Customization done using standards such as JavaScript, HTML, and CSS.

Disadvantages

Debugging can be painful. No breakpoints,

Business Logic written in PL/SQL. No layers such as a business layer and data access layer. SQL or stored procedures are referenced directly.

Web based development environment feels a bit clumsy and slow.

Tightly bound to PL/SQL and Oracle.

You must work within the paradigm that is defined by APEX. If the application doesn't fit the paradigm the effort drastically increases. For instance javascript and html/css become the way of working. This is a much different skillset. Alternatively, plug-ins can be created to extend APEX.

Migration path from MS Access to APEX is limited and works on with simple

The time you save in development time can be quickly lost in debugging, support, and customization.

No version control

Architecture

Web Browser
Apache with mod_plsql/EPG web listener
Application Express
Meta Data

KEY FEATURES according to a Oracle APEX Specialist

FRAMEWORK

The APEX framework uses SQL and PL/SQL on the back-end, and HTML, CSS, and JavaScript for the user interface. SQL and PL/SQL are solid and proven languages and they allow APEX developers to leverage the features of the Oracle database. HTML, CSS, and JavaScript are industry standard components for building web applications.

SUPPORT FOR MULTIPLE AUTHENTICATION SCHEMES

APEX supports various authentication schemes such as LDAP, database, Single Sign on, Oracle Access Manager, custom, etc. This makes it convenient to integrate with any existing applications and authentication systems.

DECLARATIVE MOBILE APPLICATION DEVELOPMENT

APEX is bundled with the popular jQuery mobile libraries and provides declarative support for building mobile applications. A special mobile theme provides APEX mobile applications with typical mobile features such as page transitions and gestures including swipe, pinch, and tap. Custom mobile themes can easily be created with the jQuery mobile theme roller.

RESPONSIVE DESIGN FOR OPTIMAL VIEWING ON DESKTOP, TABLET AND SMARTPHONES

APEX applications can be rendered on desktops, tablets, and smartphones by choosing a theme based on a responsive design. APEX includes a set of modern themes that are based on CSS3 and HTML5, and supports HTML5 charts, and HTML5 item types such as sliders and toggles.

PACKAGED APPLICATIONS – A FULL DEVELOPMENT SUITE

APEX is bundled with a suite of business productivity applications that can be freely used to assist with the management and control of projects. The suite contains applications for bug tracking, issue logs, checklist management, meeting minutes, group calendar, decision management, document management, and project management.

RESTFUL WEB SERVICES

APEX has built-in support for RESTful web services and allows applications to access data and services over the internet or intranet using standard web APIs. Database web services that implement SQL or PL/SQL can also be created. RESTful web services in APEX requires the APEX Listener, a J2EE based alternative for the Oracle HTTP server and mod_plsql.

DATABASE INTEGRATION

APEX is a component of the Oracle database and applications build with APEX can utilize or benefit from any feature of the database such as advanced security, RAC, Spatial, Analytics, Multimedia, XML DB, Job Scheduler, utility packages, etc. APEX uses SQL and PLSQL to interact with the Oracle host database.

GLOBALIZATION SUPPORT

Applications build in APEX can run concurrently in different languages. Applications are developed in a primary language and can be mapped to a supported target language. Strings in the primary application are exported to a XML Localization Interchange File (XLIFF) where they are translated and imported and automatically used by the translated application.

TEAM DEVELOPMENT

Team development is a built-in feature that allows a group of developers, working on a single application, manage new features, to dos, bugs, and milestones. Users of an application can provide instant feedback which can then be classified as feature, to do, or bug.

EXTENSIBLE

The APEX development framework supports plugins that allow developers to extend the functionality of their applications with reusable custom or third-party extensions. APEX applications can also be extended with custom HTML, CSS, or JavaScript.

OTHER FEATURES

APEX is also stacked with other utilities and features that greatly improve developer productivity throughout the life of a project.

Version Control Support – Applications can be automatically exported into SQL script files where they can be included in a version control system. APEX provides an application exporter utility as well as an application splitter utility for splitting application into individual page scripts.

Error Handling – Developers can create a single error handling function that handles exception consistently across all pages in an application.

Oracle Forms Migration Tool – If Java/ADF is not for you, APEX is a viable option for converting Oracle Forms applications. APEX includes a forms migration tool to assist with the migration of Oracle Forms applications.

Accessibility – APEX applications have automatic built-in accessibility support. Applications can be rendered in high contrast mode or screen reader mode in order to meet accessibility requirements.

Utility Reports – APEX has a comprehensive list of reports that provide real-time information on applications. Some of the more useful reports are:

Change History – List of changes made by developers
Advisor – Quality control review of an application
Database Object Dependencies – List of database objects used by an application
Debug Messages – List of debug messages generated by an application
Recently Updated Pages – List of pages that were recently updated

Features

Reports

Forms

Charts

Calendar

Templates

Navigation

Validations

Processes

Computations

Branches

Web Services

Email Services

Translation Services

Conditional Processing

Authentication

Authorization

Session State Management

Logging & Monitoring

Interactive Reports

Integrating with Other Tools / Services

SQL

PL/SQL

RAC

Spatial

OLAP

Flashback

Web Services

Text

Multimedia

Analytic Functions

Globalization

XML DB

eBusiness Suite

Migration to APEX

Direct Excel Conversion

MS Access Conversion Support

Oracle Forms Conversion Support

Security

Popular authentication supported as well as custom

Can hide columns based on user access level

Screenshots of the development environment

Built-in User Management

List of applications in the workspace

List of all pages in an application

Design a page

Options for creating a new application

Add a page

Create a Form

Sample User Interface

of the application created from a Spreadsheet

This is an interactive report. We can add charts, group by, add aggregate columns, filter, highlight, export to CSV, re-order columns, hide columns, save report, etc.

Conclusion

Oracle APEX may be a good choice depending on your needs. Here are some key criteria for this to be a good choice:

Oracle database is your database of choice
You know SQL and even better you know PL/SQL
Your application is essentially a CRUD or forms over data applications or data-centric application
You want to do RAD
You don't want to write lots of code
For customization you will need JavaScript and HTML/CSS knowledge
For writing Business logic and customizing the application you will need PL/SQL knowledge
To get the largest savings in time you will want to use the predefined forms and wizards, but this is not required.

It also means you are willing to give up some of the more common development practices

Having separate tiers for business logic and data access. Instead you must want your business logic to be in the database in packages.
IDE that runs on your pc
Robust debugger
Unit tests
Version control (Yes you can export files and them to version control, but it isn't quite the same thing in my opinion)
If you want to do OOP or MVC this is
Drag n drop files from your desktop into the IDE
Less control over files in your project

If the above criteria is okay with you then Oracle APEX is worth looking at. You get a lot of functionality. Similar to what you would get with SharePoint, MS Access, or meta-data tools, but with different strengths and weaknesses as noted above.

References:

Introduction to Oracle Application Express (APEX) Video

Oracle APEX - A Feature-Rich Development Platform you should be using