Extracting links from a screen scrape in ASP.NET (C#)

Quite often you have the need to extract parts of a web page you have “scraped” from somewhere else. I won’t go into the actual process of screen scraping, since this is covered in detail on the web, but will show you how to extract the links from the scraped page.

The function “this.GetScreenScrapeHTML()” is not included in this example and is the part you’ll need to write yourself. Take a look into System.Web.HttpRequest for more information.

The MatchCollection holds all of the successful matches, which you can then ennumerate. By using named groups, you can access each by name.

This example does not take into account situations where the link is not text, but an image instead. If this is the case, you should use another regular expression to extract the “alt” text from the image (if available). That example I’ll dig out another time if anyone is interested!

Problems with SQL Reporting Services and MSDN Forms Authentication sample

Microsoft offer a sample application to alter SQL Reporting Services from Windows Authentication to Forms Authentication. The sample is not bad, but essentially a bit of a hack. We have had several problems installing it, which I thought I would document. Other developers seem to be having similar problems.

The majority of organisations will probably desire to you SRS in a manner outside the basic install. This usually means that you need a custom interface to the reports, which replaces the standard Report Manager.

Samples

Microsoft offers two samples which are useful to a point. They are as follows:

Other samples and useful Microsoft information can be found here:

Examples externally from Microsoft which I found useful can be found here:

Finding Good Examples

However, even with this wealth of information, it is still pretty difficult to find good examples of what most of us want. The example Forms Authentication in Reporting Services is useful, but I think seriously flawed. It tries to address a problem, that is actually inherent in Reporting Services 2000, which is that is relies on Windows Authentication and expects most users to be happy with URL access to reports.

URL access to reports is a security risk. You can’t pass hidden parameters to it, as all you need to do to compromise the integrity of the system is to right click -> Properties -> URL. Obvious to most people will be the parameters slung off the end. UserID=x, can pretty easily be changed.

This leaves you with the web service, which in my opinion should have been the only way to access SQL Reporting Services. With this focus, Microsoft would have built a much superior product. Focusing on Windows Authentication ties the product in with their server business, and essentially becomes a nightmare for anyone to administrate. We as an organisation don’t want to maintain a multitude of domain accounts so that people can access reports. We don’t have the time, or inclination to do so. We already have a application which follows the fundamental principles of Microsoft tiered development and a security and user model that fits with that using Forms Authentication. We want to tie them together. If you are reading this, I’m guessing you do too!

So what is the solution? We think that for the moment we will put up with the example Microsoft has put together and use the custom security extension within SQL Reporting Services itself. It isn’t really exactly what we want, but we can’t see a way around it.

Key Problems with the MSDN Custom Security Sample

We had several problems installing this sample. They are as follows:

  • The documentation for config file changes is disjointed. It is easy to get lost switching back and fore between the ReportServer and ReportManager directories. It would have been better to have just go through all the changes in one section, and then focus on the other. We missed out one of the files (you’ll get a nicely presented Report server is not running error in ReportManager), due to this documentation misdemeanour.
  • Forget localhost! For development, localhost is something that an ASP.NET programmer seems to be attached to at the hip. For the web service you need to remove it from your consciousness. If you are having the problem where everything seems to work with the Report Manager, you are able to see the new web form login page, and register a user, but you just can’t seem to login, as it bounces you back to the login every time? When you put in an invalid user it works as you expect? When you enter the wrong password it tells you so? Change localhost to your_machine_name, and hey presto. Took me a long time following the cookie in debug to figure out that little nasty!

  • The Standard Edition of SQL Reporting Services will not support “Security extensions, including support for custom or forms-based authentication”, hence this sample will NOT WORK! For more information check out the various Editions of Reporting Services.
  • Verify your config files have been changes correctly. It is very easy to get lost or miss one out.

  • If you change the name of the DLL from Microsoft.Samples…., then make sure you make the appropriate changes across all the config requirements.

Where to go from here

We now have the sample working on the Development Edition, but since Standard Edition doesn’t work a decision needs to be made about continuing on. To work with the web service is fairly easy, and we are able to retrieve the list of reports, gain report parameters and open reports. However, this still is outside what we really need, which is a way to embed these reports into a viewer style, but without the URL access.

So how do you get the content from the web service .Render into a separate frame. You can’t just dump it into the same page, since a PDF file won’t render like this unless you clear the buffer, and a HTML page comes back complete with HTML, HEAD and BODY tags. The HTML standard goes right out of the window on that one.

I think the solution might be to encrypt the URL access, or use some kind of proxy system, that does a System.Web.HttpRequest to the report you wanted, without revealing the URL parameters, which essentially most people wouldn’t really want to display obviously to their users.

Ideas are welcome.

XHTML – compliant links in new window

It is interesting, how when you really need something, it pops up for you conveniently right under your nose. Today I was looking for a good way to open up a new window, but the XHTML syntax check throws a wobbly on target=”_blank” contained in anchor tags.

I then happened to start reading Fuzzy Outlines RSS feed because I noticed a new post and yesterday Paul posted an entry about DOM scripting, which lo-and-behold contains a very neat example of opening a new window, whilst still maintaining the link in a standard Google-friendly manner (no window.open) and is XHTML compliant.

To check it out go the Fuzzy Outline website.

Generate XSD from Stored Procedure

I’ve always normally created XSD files manually, hand typing each field, and for each element. The procedure was a little tedious to say the least. I have been looking at ways to speed up database creation and the setup of basic project information in Visual Studio.NET.

The first thing I looked at was the Export to XML feature of Access 2000. As many of you will already know, you can link Access directly via ODBC to a live SQL 2000 database, or any other datasource for that matter. With SQL 2000, that means you can create tables, views and stored procedures using the more flexible designer in Access (shitty t-sql code, but who cares), compared to the simple editor provided by SQL Server Enterprise Manager.

I couldn’t get the Access export feature to work, so I looked at XSD.exe, part of the framework tools, allowing you to infer and create schemas. Something triggered and I remembered that the development environment uses this tool when creating datasets in Visual Studio.NET.

The result is that once your stored procedure is complete, simply add a new dataset to your solution, then connect to your local or remote database, drag the stored procedure onto the designer and hey presto – you have all the fields and data types created for you.

Very neat, yet obvious really. I don’t know I had missed it before. It makes me realise how much of VS.NET I might not be using to full capacity.