Quantcast
Channel: VBForums - CodeBank - Visual Basic 6 and earlier
Viewing all articles
Browse latest Browse all 1512

[VB6] An ADO "PutString" function

$
0
0
Normally we can use tools like Jet's Text IISAM to import delimited text. But sometimes our delimited text might not be in a file. Perhaps we received it from a web service or a TCP connection or something, and we don't want to take the step of writing the data to disk just to turn around and import it into our database.

The ADO Recordset has a GetString method that can be used to convert its contents to a delimited text String value fairly easily. If only we had an inverse function, a sort of PutString we could used?


PutString

Here is a function that does just that. It takes care of parsing the delimited columns and rows and posts these to a database table using an append-only cursor Recordset.

All of this seems pretty well optimized, though with effort you might squeeze out another millisecond or two. The commonly advocated "split the splits" approach is far slower than this logic:

Code:

Private Function PutString( _
    ByRef StringData As String, _
    ByVal Connection As ADODB.Connection, _
    ByVal TableName As String, _
    ByVal ColumnIds As Variant, _
    Optional ByVal ColumnDelimiter As String = vbTab, _
    Optional ByVal RowDelimiter As String = vbCr, _
    Optional ByVal NullExpr As Variant = vbNullString) As Long
    'A sort of "inverse analog" of the ADO Recordset's GetString() method.
    '
    'Returns count of rows added.

    Dim SaveCursorLocation As CursorLocationEnum
    Dim RS As ADODB.Recordset
    Dim ColumnStart As Long
    Dim ColumnLength As Long
    Dim ColumnValues() As Variant
    Dim Pos As Long
    Dim NewPos As Long
    Dim RowLimit As Long
    Dim I As Long
    Dim AtRowEnd As Boolean

    If (VarType(ColumnIds) And vbArray) = 0 Then Err.Raise 5 'Invalid procedure call or argument.

    SaveCursorLocation = Connection.CursorLocation
    Connection.CursorLocation = adUseServer 'Required to create this fast-append Recordset:
    With New ADODB.Command
        Set .ActiveConnection = Connection
        .CommandType = adCmdTable
        .CommandText = TableName
        .Properties![Append-Only Rowset] = True
        .Properties![Own Changes Visible] = False      'Doesn't matter when using exclusive access.
        .Properties![Others' Changes Visible] = False  'Doesn't matter when using exclusive access.
        Set RS = .Execute()
    End With
    Connection.CursorLocation = SaveCursorLocation

    ReDim ColumnValues(UBound(ColumnIds))
    Pos = 1
    Do
        RowLimit = InStr(Pos, StringData, RowDelimiter)
        If RowLimit = 0 Then RowLimit = Len(StringData) + 1
        I = 0
        AtRowEnd = False
        Do
            ColumnStart = Pos
            NewPos = InStr(Pos, StringData, ColumnDelimiter)
            If NewPos = 0 Or NewPos > RowLimit Then
                Pos = InStr(Pos, StringData, RowDelimiter)
                ColumnLength = RowLimit - ColumnStart
                If Pos <> 0 Then
                    Pos = Pos + Len(RowDelimiter)
                    'Auto-handle CrLf when RowDelimiter is vbCr.  GetString()
                    'itself defaults to vbCr as the RowDelimiter.  Some software
                    'strangely enough will use a mix of vbCr and vbCrLf:
                    If RowDelimiter = vbCr Then
                        If Mid$(StringData, Pos, 1) = vbLf Then Pos = Pos + 1
                    End If
                End If
                AtRowEnd = True
            Else
                Pos = NewPos
                ColumnLength = Pos - ColumnStart
                Pos = Pos + Len(ColumnDelimiter)
            End If
            ColumnValues(I) = Trim$(Mid$(StringData, ColumnStart, ColumnLength))
            If Not IsMissing(NullExpr) Then
                If ColumnValues(I) = NullExpr Then ColumnValues(I) = Null
            End If
            I = I + 1
        Loop Until AtRowEnd
        RS.AddNew ColumnIds, ColumnValues
        PutString = PutString + 1
    Loop Until Pos = 0 Or Pos > Len(StringData)
End Function


Demo

PutString is contained in the attached demo within Module1.bas.

This demo creates a new empty database with a single table SOMETABLE on its first run. Once it has an open database connection it first deletes all rows (if any) from SOMETABLE.

Then it creates a big String containing 5000 rows with 8 random data fields (of several types). This String has TAB column delimiters and CR/LF row delimiters.

Then it calls PutString to append the data to SOMETABLE, displays a MsgBox with the elapsed time for the PutString, and ends.

The compiled program takes from 0.12 to 0.16 seconds to do the PutString call here, but the Timer() function isn't very accurate for small intervals.


Issues

I think I have the bugs out of the parsing logic.

This has only been tested with the Jet 4.0 provider, and I'm not sure how well it will do with other DBMSs. With Jet I found no advantage at all to wrapping the appends in a transaction or using batch updating, both whizzy performance gaining techniques according to common wisdom (which often isn't wise at all). Using any form of client Recordset only hurt performance, pretty much as expected.

Of course many variables have been left out, for example other connections could be updating, holding locks, etc. and that could make a huge difference.

Opening the database with exclusive access gains you a little more performance too. When you aren't sharing a database this is always a good bet, since eliminating locking naturally improves performance. The demo just lets this default to shared access.


Nasty Issues

The ADO Recordset's GetString method has a nasty secret. Not quite that big of a secret to classic ASP scripters since it was tripped over quite early. That secret is:

GetString does not use the invariant locale and you cannot set its locale

How does this matter?

What about Boolean values? What about fractional numeric values?

It turns out the PutString has the very same limitation (or is that a feature??).

As far as I can determine through testing, the demo should work just fine even in one of the Central European locales (e.g. Germany) with funky number punctuation it different wors for "true" and for "false." That's because it is using the locale-aware CStr() function when building the big test String value.

However the main reasons to work with a delimited text tabular data format are (a.) persisting, and (b.) interchanging data.

So a program running on a German language machine can't use this for talking to a French language machine. A French machine can't talk to an English machine because the number formats may match but the Booleans are goofy.


SetThreadLocale

The clever may think they know the answer, but calling SetThreadLocale passing LOCALE_INVARIANT won't cut it. For that matter the story is more complicated for supported versions of Windows anyway, involving SetThreadUILanguage calls.

But as far as I can tell the Variant parsing/formatting routines within OLE Automation that ADO makes use of lock in the locale pretty early and are not swayed by flippity-flopping locale settings around GetString or (my) PutString calls.
Attached Files

Viewing all articles
Browse latest Browse all 1512

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>