Microsoft.com Operations

Tuesday, February 06, 2007 12:27 PM

Application Performance Testing at MSCOM

One of the very important tasks that falls on the Test teams here at MSCOM is performance testing of our web applications. The topic of performance testing is a very large subject and one that probably cannot be covered in just one posting so instead of trying to cover everything at once we are going to try and break up the subject and hopefully produce some follow up postings on the subject.

When it comes to performance testing we start with gathering information, so let’s begin there. We are of the belief that in order to be successful in your performance testing you have to know some information before you just jump right in. The first pieces of information we want to determine are:

· What is the acceptable performance for the web application?

· Is there any historic performance data for benchmarking?

· What information needs to be reported about the performance testing (are there performance score cards available from other groups/teams for comparison).

The next step is to determine if any application specific work needs to be done before starting the performance testing or developing a performance test plan, such as:

· Are there any application specific barriers that need to be handled e.g. Passport sign-in emulation, any other external dependencies?

· What are the important areas of the application, where performance testing is critical (prioritize the areas of the application for performance testing)?

· Does the application need component level performance testing?

· Does the development code require any hooks in order to get any specific performance information?

The acceptable level of performance can be very tricky to determine. Here at MSCOM we generally have to determine the appropriate level from information that we get from various groups or stakeholders. Multiple groups and individuals have a say in what is acceptable performance for a given application. The business owners will know how many customers or visitors they want to support while the system engineers and database administrators will know how much additional CPU, memory and disk utilization the servers can support. All of this information then determines what acceptable performance is for our web applications.

After determining what the performance should be we now have to figure out how to measure the performance. This depends a lot on what the application looks like. We have some applications that do multiple different things ranging from serving up static or dynamic content, querying large or small data stores and even some that send email. This is where you really have to know what your application does to determine what and how you are going to measure the performance.

There are a number of resources available to help you identify what to measure when running your performance tests. There are books and websites that are dedicated to this subject, a couple to take a look at are:

· Performance Testing Microsoft© .Net Web Applications

· Improving .NET Application Performance and Scalability

For a typical web application here are some of the major things that we look at:

· Requests per second (how may HTTP requests can be handled per second?)

· Page views per second (How many web pages can be requested? Since a page is almost always made up of more than one request this number will be different than requests per second.)

· CPU utilization (how much processing power does the web application take?)

· Memory usage (how much memory does the application need?)

· Throughput (how many bytes per second?)

· Response Time (how long it takes for a given request to complete?)

These results should be summarized and then analyzed to identify performance bottlenecks or issues. There are a myriad of other things that can and should be measured and analyzed and we encourage you to do some research on what each of these is and how it relates to your applications.

The final thing that we want to mention in this post is some of the tools that we use to help us with our performance testing. We have recently started using the latest release of Visual Studio which includes some very cool testing features in the Team Editions. The Team Edition for Software Testers and the Team Suite edition both contain the testing features; one of which is the creation of load tests. This allows us to create and run our unit tests and performance tests all from Visual Studio. In addition to running load tests Visual Studio will collect performance counters for analysis.

Like we said at the beginning of this posting, we are planning on putting together some more information on how we do performance testing at MSCOM so please be on the lookout for future postings.

Posted by MSCOM | 0 Comments

Wednesday, January 31, 2007 7:37 AM

Why is the security team still laughing at your functional spec?

You cranked out your best functional specification ever for your Internet web application’s security. It has staggering details about what roles exist for the application and what operations each user can do in each role. Every piece of data in the system has information about what each role can do with it and each user’s actions are logged for auditing. But still when you request sign-off from the security review team you still get that same deep, deep sigh. What could possibly be missing?

1. The Internet – Anything displayed to user the user can copy and share to the world. Anything the user can download they can post on the Internet themselves. Are you sure you trust your users with everything you provide access to through the site? If not, remove it.

2. User Empowerment - What can the user get the application to do for them? If the application sends e-mail from your company address (maybe you let users invite other users), what prevents users from leveraging your application to send spam or annoying e-mails to people? Does all data customers enter get reviewed before it is displayed to other users? If not, how much can malicious users damage your company reputation by misbehaving?

3. Authentication Authority – What mechanism will customers use to authenticate with the application? Are you going to take your business unit down the unpleasant road of creating accounts for customers and helping them reset their password? If so, how much does it cost, what criteria is enough to reset a password and how does the customer do it?

4. Provisioning - How do you make sure the users invited to use the application are the users which get access to the application? If the wrong customer gets provisioned in a role, all your work to say who can do what is for naught. If your business already has a single sign-on solution, how do users with existing accounts and new customers get provisioned? One option is to create one access code for each user and let new users create accounts automatically. The user is sent the access code in a safe way and must authenticate (might involve creating a new account) and enter the access code. Once used, the access code is destroyed after associating the customer information from the access code to the account.

5. User Lifecycle – Every project has that happy phase where new users come, new features appear and management gets excited and showers you with gifts. But what about later? When do users get removed from the system? For example, do you revoke all user access every quarter and make everyone reapply? This is a poor user experience, but if you want to collect accurate marketing data about customers during registration it is great to have fresh users. Never trust the customer to let you know who they are. What information do you use to decide to remove customers? Is it information the customer entered or information your team stored when you invited the user to use the application? If your users are from business partners, how does a user get removed if they leave their company and how do you even know they left?

Remember, you are the expert for your application. Do not expect the security team to uncover anything on your schedule. Work with them and find out what they are looking, get feedback and learn how they think. Your projects will run so much smoother with you raising the issues, resolving them and confirming the resolution with the security team. Maybe they won’t laugh at your next release; but instead give you that broad grin of satisfied respect.

Posted by MSCOM | 1 Comments

Thursday, January 18, 2007 10:56 AM

Keeping Track of Database Capacity -- Monitoring and Planning

As a DBA at MSCOM, I'm often asked how long our servers have before they run out of disk space. This is also one of the toughest questions we get. Any estimate depends on having a good forecast for user traffic and data volumes and even considering database internals like, when will index trees need to "branch out"? Oh, and do this for each table! I don't know about you, but I barely know what I'm having for dinner tonight, let alone how much web traffic MSCOM will have next month or next quarter.

Yet it's critically important to know this, because it impacts every application's availability and scalability. Plus there are generally long lead times for buying and deploying new hardware. So if we suddenly realize we are running out of space on any one of hundreds of servers, it can be a real problem.

Before I describe our solution, let me also mention capacity monitoring. Those of you who were around in the "prehistoric" days of SQL Server 6.5 will remember that all the data (including indexes) for a database were stored in one logical "data device". It was important to have a "database capacity" alert to monitor the fullness of this device, because if it hit 100 percent you got an 1105 error and everything pretty much stopped. That and a "log capacity" alert was all was needed. You just added a new file, or expanded an existing one, right up until the drives were full.

With SQL Server 7.0, 2000 and 2005 we now have to deal with filegroups. That 1105 error now gets thrown when any filegroup fills up. Of course, we want to use autogrowth when ever possible. But my point is, isn't it filegroup capacity that matters now, and not database capacity?

In MSCOM, like any other large data center, we have thousands of databases and tens of thousands of filegroups -- any one of which can cause an application outage if it runs out of disk space. Each filegroup stores a different set of tables and indexes, all growing or shrinking their data at different rates. That's a lot of moving parts. You can imagine it being a full time job keeping track of a relatively small number.

Well, none of the DBAs here wanted a full time job like that. So every day we schedule an insert of all of our filegroup sizes into a centralized data mart. The following script is an example of code you might use to gather that information:

SELECT

RTRIM(groupname) AS FileGroupName,

RTRIM(name) AS 'FileName',

RTRIM(filename) AS PhysicalFile,

CONVERT(numeric(19,5), size*8/1024) AS 'Allocated MB',

CONVERT(numeric(19,5), (CONVERT(numeric(19,5), FILEPROPERTY (name, 'spaceused'))*8/1024) ) AS 'Used MB',

100*convert(numeric(19,5), ((size - FILEPROPERTY (name, 'spaceused')) / CONVERT(numeric(19,5),size)) ) AS 'Percent Free'

FROM sysfiles f, sysfilegroups fg

WHERE f.groupid = fg.groupid

UNION

SELECT

'z. All', 'All', 'All',

SUM(CONVERT(numeric(19,5), size*8/1024)) AllocatedMB,

SUM(CONVERT(numeric(19,5), (CONVERT(numeric(19,5), FILEPROPERTY (name, 'spaceused'))*8/1024) )) UsedMB,

100*(1-((SUM(CONVERT(numeric(19,5), (CONVERT(numeric(19,5), FILEPROPERTY (name, 'spaceused'))*8/1024) )))/(SUM(CONVERT(numeric(19,5), size*8/1024)))))

FROM sysfiles f, sysfilegroups fg

WHERE f.groupid = fg.groupid

ORDER BY 1

Having the data stored in a table allows us to do some interesting things. We can easily find the fullest filegroups, just by sorting on the "used" space column. But the most useful data comes from comparing today's filegroup sizes with those measured 5 days ago. When we divide that by 5, we get a daily average growth rate. Divide again by the current remaining free space in the filegroup, and we get the average number of days before the filegroup runs out of space.

Operationally, all we have to do is sort on this "days remaining" column and filegroups running out of space soonest jump to the top of the list -- meaning we can proactively increase them manually or ensure that they can autogrow. Instead of agonizing over thousands of dynamically growing pieces, we only focus on the few becoming critical. Or the few with suspiciously high growth rates.

How does this help us with our original question of how to predict when a server will run out of space? We compute a server-level "days remaining" metric by adding up that server's filegroup growth rates and dividing by the remaining free drive space (which we gather by running xp_fixeddrives each day and inserting into the mart). So we can answer WHEN we will hit bottom. Let's say "Server-X" has 90 "days remaining". Knowing that gives me plenty of time to order new hardware, or get developers thinking seriously about purge processes. Or let's say someone asks, why is our data outgrowing our brand new, expensive server so fast? We can look at the filegroup growth rates and drill down to the tables and indexes causing the behavior in question.

I hope I've shown you how useful it can be to gather and analyze file growth data. Depending on your business needs, it can be simple or complex. I'd like to end my discussion by asking your opinion. Does monitoring overall database capacity matter any more, or is filegroup and server capacity all we really need to care about?

Posted by MSCOM | 1 Comments

Friday, January 12, 2007 12:20 PM

Is anyone watching the health of your Multi-tier Web Application?

Do you know if each tier of your application is healthy? This is especially challenging when you have a multi-tier application (eg. web site, web service, and database). When there are problems on the Web site, you need to be able to quickly identify what tier is causing the problem.

One of the main challenges an operations team faces is monitoring applications. When an application is deployed, the operations team needs know that things are working. To know that things are working, the application needs to have monitoring setup. There are many monitoring solutions in the industry that can help check application health. However, these monitoring solutions do not automatically know what needs to be monitored. Our team uses Microsoft Operations Manager for event collection and a custom built tool for the application monitoring pages.

A good monitoring “BEST PRACTICE” comes from identifying critical features & dependencies of an application. Ideally this is done when an application is being developed. This allows the development team to create “Monitoring hooks” around these features & dependencies. These built-in “monitoring hooks” enable faster issue resolution times, when documented by project teams & configured for monitoring by Operations.

Best practices: Application Monitoring Pages & Reporting

A monitoring page should test for “success” of a critical feature (or features) or dependencies’ & report via a HTTP status code – 200 = Success & > 599 for an application specific failure.
A monitoring page can be used to create a multi-step test where each step can test a specific piece of functionality. Each step can then return a HTTP status code of 200 upon success or > 599 HTTP status code upon failure of that step.
Document each of the “monitoring test” steps in the monitoring page, related HTTP status codes & action to be taken when a specific non-200 status code is returned to the monitoring pages.
Apart from returning a non-200 status code, monitoring pages can be used to write events into the eventlogs (see Event-logging best practices) to provide more specific information about the error which can be used for further troubleshooting purposes.
The monitoring pages can also be used to render more detailed error information that can be viewed by the operations engineer.
A monitoring page representing critical features & dependencies can be used to report on overall Availability of the system, if required.

Best Practices: Event-logging & Reporting

The default should be to write only actionable events to the event log – Anything informational/warning should not be written as an *error* into the logs.
Where additional non-actionable info needs to be collected, allow for a switch to be set on-demand that will turn on information/warning entries
Ensure a combination of Event ID & Event source is always unique.
Ensure Event sources have the App name as a qualifier (Event source=”ApplicationName_ErrorReadData”) in order to avoid conflicts w/ other applications on the same server that has the same feature.
Ensure event text is descriptive enough so appropriate action can be taken.
Document Event ID’s & Event sources used for an application and provide troubleshooting steps to resolve these errors – This is the key for monitoring & resolving issues with quick turnaround times for that application.
In situations where an error is generated a number of times, allow the application to write every 10^th occurrence (a value that can be configured through a config file) of that error to the eventlogs (Note: first time an error happens it’s always written to the log, subsequent similar errors are then incremented). This ensures an event log is not full w/ same error resulting in loss of other valuable information.
Use unique (custom) event logs per application where required.

Do not log PII (Personally identifiable information) or password information into the event logs

Some additional things to keep in mind:

· Make sure that you are monitoring each tier of the application.

· It IS possible to over monitor an application. If you have to many monitors on the system, you can start taking system resources away from the application. You have to find that balance of monitors to system resources.

· Treat monitoring as a feature in the application development process. This helps ensure that it is documented and “Monitoring hooks” can be created.

· Any monitoring tool that can check the status code returned in the header of a web page can be used for these application monitoring pages.

Posted by MSCOM | 0 Comments

Wednesday, December 13, 2006 9:56 AM

Scrum De-Mystified, Not So Bad After All

It all started this past summer, of 2006. I started hearing about the latest sure-fire recipe for conducting software development projects: Scrum. Of course this new-fangled methodology came with many new terms, techniques, and rules to learn. After developing software for twenty-plus years now, I am wary of new crazes in this business, so I kept my distance, and did not pursue it.

As fall approached I heard increasing buzz about Scrum. Then finally the request came from my management to try it. I would dive in, wary or not, and I was to follow this new process with no formal training – learn on the job. The project leader, the “Scrum master”, was the only one trained in this practice, and would guide us. Alright, time for this old dog to learn another new trick.

Quickly I had to ramp up on Scrum. I asked co-workers and read many Web articles to “bone up” on it. I quickly learned there was a fair amount of disagreement on exactly how Scrum is practiced. Great! So then came the challenge of forming opinions on what aspects I felt were helpful, and which were overkill or a bad fit for scope of the internal tools development project I was to work on. A big concern I had was the possibility of Scrum -- with its daily project tracking meetings -- being too heavy on the micromanagement.

Before we go any further, I will try to de-mystify Scrum, using my own words, for those who still could use the explanation.

Scrum is relatively new agile software development methodology where you repeatedly develop usable components, in planned cycles of about a month, or sprints. It is highly-collaborative; with emphasis placed in team success, and software features evolved via frequent team interaction. It is called lightweight as it focuses on completion of limited-scope features, based on limited high-level functional requirements. It is light on the documentation, which can be addressed in follow-on sprints. The goal is to progressively build software, piece by piece, with each piece being a demonstrable, functioning unit. As a team completes more sprints, it gets better at working together, communicating with each other, estimation, and ultimatly self management. Both the software and team evolve.

Older software development methodologies include waterfall (1970), rapid application development (1980’s), and “cowboy coding”, a methodology-free approach that has probably been practiced since computer programming began. Like rapid application development (RAD), Scrum is a form of iterative development. The software is evolved via some degree of trial and error. But I would call Scrum a more controlled form of iteration. I need to say this because RAD is now widely regarded as bad practice. It is too much like the cowboy coding, with the rapid prototyping leading to poorly thought-out, limited, and too-customized designs.

The iterative, exploratory nature of Scrum deals well with ambiguity and the reality of not being able to foresee all software requirements and technical challenges before starting the development. You get to start coding with rough estimates on small units of work. It may take longer or shorter to complete a task, and that is just fine, as long as the team completes some demonstrable, high priority task before each sprint ends. You work off of a prioritized running backlog of requirements.

Ok, so how did it work for me?

Very well! And I am in my third sprint now.

First, our daily meetings are cool. They are short and focused. No more being asked every hour or two if you are “on track”. The meeting “pain” is predictable and limited. Measurable project progress becomes highly apparent to all involved. I get to focus on my work, with limited interruptions, by design. A key element here is emphasis on not breaking the work/thought flow of each worker. You avoid multitasking and taking change requests in order to focus on completing your sprint tasks. We come together for the quick huddle each day, then back to work. We did our meetings right after lunch, after we were already on a break from work.

Next, I like coding with rough designs, and rough estimates. No more pretending you know exactly what each piece will do, how it will do it, and how long it will take before entering a do-or-die six month death march, living up to your scary wild guesses. Taking it one month at a time is much more manageable.

Finally, the team building is great. Another key Scrum element is for the teams to self-manage. Because we work collaboratively with frequent interactions, we need less guidance from any outside people. Ideally, the Scrum master plays the role of record keeper, coordinator, and is an objective non-stakeholder. She keeps the outside world informed of progress, via integrated burndown charts, and keeps us from being distracted by outside influences like the business owners or managers.

Not so bad after all!

Recently I took a three day class on Scrum to fortify my knowledge but found out we were doing just fine without the formal training. Scrum seems more about philosophies and attitudes than strict process. You can adopt the specific practices that work for your team. Adaptive processes, adaptive emergent designs, lean engineering. This old dog keeps evolving, and so can you.

(Editor’s note: Microsoft Press, (Best Practices series) has a great book out by Ken Schwaber entitled Agile Project Management With Scrum (ISBN-13:978-0-7356-1993-7)

Posted by MSCOM | 1 Comments

Friday, December 08, 2006 8:49 AM

Tell me how I’m measured…and I’ll behave accordingly

(This is the initial blog post from the MSCOM Service Management Team. This team is an essential resource for our group. They are responsible for on-boarding new customers to MSCOM, working with existing customers to provide guidance and smooth the way for releases, interfacing with our Data Center providers and provide a whole array of other functions.)

Without clear measures individual groups and/or individuals will decide for themselves what measures they use for making decisions as well as to gauge their success…leading to a disjointed organization and misaligned customer relationships. For example, your Operations group may measure an application’s availability on a 24/7 clock while the customer owning the application really only cares about the application’s availability during discrete time windows throughout the day. Without common and agreed to measures both parties will doubt each other’s data and blame the other for not doing their job while in the mean time make little or no progress on objectively addressing any existing issues. (That example isn’t fictitious either.)

Measures should be viewed as a process outcome, and fortunately most processes only have a handful of inputs that impact the outcome. Our ultimate goal is to identify and control those specific inputs using a dashboard with control charts so that our service monitoring is more proactive and less reactive. By understanding if critical inputs are in or out of control we can predict if the outcome will be in or out of control.

So how do you figure out what the right measures are? Understanding your customers and your organization’s goals and major processes sure helps, but here is a simple method to jumpstart defining the measures you and your group need.

1. Define the scope for your measures. Is it for your CEO, Director, middle manager, worker bees, customers, etc?

2. Identify the questions the user needs an answer to. For us, being an IT Operations group there undoubtedly needs to be a mix of operationally focused and business focused questions.

3. Clarify why answering the question is important with bullet points or brief sentences. This will not only help validate the question but also help you understand if the question is at the right level for your target audience.

4. Identify the specific measure, its dimension/s, and target.

· Measures are the data point we’re interested in.

· Dimensions are ways in which you want to be able to slice and view the measure’s data.

· Targets are the goals that you have for that measure.

· Here’s an example showing how all 3 elements fit together. If the measure is “% of Incidents Resolved within SLA,” the dimensions you may want to see the measure by could include by customer, by application, by incident priority, and by time (quarter, month, week). The target would be pulled directly from your SLAs (Service Level Agreements).

5. Identify other questions that would be raised by this measure. These questions can help you identify additional measures for your target audience as well as identifying linked measures for other audiences either up or down your company’s chain.

6. Determine what actions could be taken with this measure’s information. If people don’t know how the information helps them make decisions or take action they’ll undoubtedly ignore the measure.

7. Identify potential data sources for the measure.

8. Identify the data owner. Or in other words, who is responsible for making sure the data is collected consistently and remains up to date. The data owner isn’t necessarily also responsible for the good or bad news you’ll see with the measure.

9. Identify any potential data collection and reporting issues. Maybe a tool exists to collect the data you need but it’s entered inconsistently so the process will have to be fixed before you can accurately report on the measure. Maybe the data you need isn’t currently collected at all. Clarifying issues such as these will help you determine what reporting can be built immediately and what reporting will require some additional planning and efforts to make available.

This linked table gives a generalized example of how all this information looks together. With a clear understanding of what measures you need and why, you’ll now be able to more easily explore options for effectively visualizing the information in charts, graphs, histograms, etc. If you define the picture before understanding the measures needed there’s a good chance of ending up with less than optimal reporting. If you put in the effort, your group and customers won’t doubt the data.

Posted by MSCOM | 3 Comments

Wednesday, November 22, 2006 9:50 AM

Sunset…Do You Know When Prune the Application Tree?

(This is the initial blog post from the MSCOM Operations and Portal Release Management team.)

When it comes to controlling costs, one of the best places to start is by managing your application portfolio.

Recently, the MSCOM Release Management team started looking at making improvements in Portfolio Management – not just the portfolio of applications being developed by the teams that we work with, but also the applications that are currently deployed in the MSCOM operational environment. What we found – a decentralized system, focusing on developing and deploying new applications – made us realize that a deeper look into the overall portfolio was needed.

Placing checkpoints onto adding applications to the Application Portfolio slows the growth of the portfolio, but does not stop it. In order to fully manage the Application Portfolio, another piece is needed -- Software Rotation (also known as "sustained engineering"). In the "Unified View of the MSCOM Application Universe," once an application has been deployed into production, and the development teams are no longer actively working on replacement versions, "ownership" of the application is assigned by the portfolio managers to the "Software Rotation" team, who are chartered with performing maintenance on applications that satisfy critical business needs, but which are not slated for additional feature work. If a feature request comes in for one of these applications, it is referred to the appropriate development team by the portfolio managers. The Software Rotation team also periodically reviews its suite of applications to identify candidates for retirement, and works with the steering committee to determine sunset dates.

Centralized application portfolio management provides many benefits to MSCOM teams and customers, including the ability to track costs, better consistency and cost reductions. Product development teams benefit too, because they are able to stay focused on delivering customer value through new applications and features. This allows the product development teams to focus on delivering value releases to customers, rather than being randomized by having to fix old applications. Adding value, after all, is what we’re all here to do.

Posted by MSCOM | 0 Comments

Wednesday, November 15, 2006 10:10 AM

Windows Live ID Adoption Solution in Microsoft.com

(This is the first blog post from the Test Team that is working on the new Microsoft.com portal site. Testing is critical in the process of getting new code into production. Authentication and authorization can be especially interesting issues to test.)

Passport has long played the role of the single sign-on experience as the authentication service across web sites in Microsoft.com. The passport team plans to turn off support for Passport Manager 2.5, aka PPM2.5 and replace it with the newer version of the Windows Live ID Relying Party Suite, aka RPS. RPS4.0 supports IIS running in 32bit mode. Soon we will have RPS4.5 supporting IIS running in 64bit mode.

As a provider of authentication and authorization services to Microsoft.com portal site, Portal Secure Team encountered a series of issues upgrading from PPM2.5 to RPS4.0 and RPS4.5.

Here are some of the issues we are facing:

1. How do we enable adopters to minimize the change of their code for such migration?

2. How do we handle two different sites A and B using the same Site ID and same cookie encryption key (CEK)? How does the passport handle the redirects when user signed into site A that is considered invalid due to security level or time window or even different version of passports?

3. How do we handle two different sites A and B using different site ID or different encryption key (CEK)?

4. How do we handle two different sites with site A running PPM2.4 and site B running RPS? RPS does not recognize PPM cookies or vice versa.

5. How do we allow adopters to use PPM when they are able to maintain good working condition when the switch happens? How do we provide the seamless transition?

6. How do we handle the redirect to our authentication page when user is not signed in to Windows live ID in the scenarios described above?

The team has built a Passport Authentication Library (PAL) to ease the pain of the Windows Live ID adopters. Basically, the PAL has wrapped the RPS APIs that are useful in our adoption scenarios and created an HTTP module to provide a unified authentication solution for Microsoft.com.

Below is an example of the code snippets to show that we have handled a lot of implementation behind the scene required by RPS and provide adopters with similar look and feel of the old implementation.

// Before PAL:

using System.Web.Security;

...

PassportIdentity pi = User.Identity as PassportIdentity;

if (pi != null && pi.IsAuthenticated)

{

int puidLow = (int)pi.GetProfileObject("MemberIDLow");

int puidHigh = (int)pi.GetProfileObject("MemberIDHigh");

// ...

}

// After PAL:

using Microsoft.MSCOM.MemberServices.Passport;

...

PassportUser pu = PassportUser.Current;

if (pu != null && pu.IsAuthenticated )

{

PassportID puid = pu.Puid;

// ...

Here is a peek into one of the complex business scenarios we face:

Scenario: User Access A Secure Public page

Frequently, Enterprises tend to get stuck with implementation of new releases and new innovation of the software. It is never an easy task to upgrade server side components. However, the PAL provides an example of how to bridge the gap with less pain in such processes. We have gained tremendous experience in making such efforts.

Posted by MSCOM | 1 Comments

Wednesday, November 01, 2006 9:54 AM

We Herd Cats

(As the Microsoft.com team has expanded its role from being operations based to taking on the additional role of software design and development, we thought it would be instructive to get our PM’s to talk about what they do and how it helps get the bits into production.)

So what value does a PM provide to a project? Don’t they just add an additional layer of complexity to the software development cycle? Can’t a project just be run by a dev and a tester and installed and run by a Systems Engineer? Ah, I have heard all these questions over the years at Microsoft and, I have to tell you, I always go back to this expression: “we herd cats.”

Program management is all about taking customer requirements, analyzing market data, reviewing new technologies and making technology bets to create innovative, high-quality products. The end goal is to ensure our products meet customer needs and provide customer VALUE. To get all this done, Program Managers have to interface with Business Teams, other Microsoft Teams, Testers, Developers, Release Management, Operations and Support. Hence the expression “we herd cats”.

The best way to give you an insight into our crazy world of Program Management is to provide an example. The one that comes top of mind is our recent adoption of Microsoft Office SharePoint Services 2007 (MOSS) as our platform for the http://www.microsoft.com homepage and, over time, all subsidiaries. This adoption is by no means complete; we are actually just getting the ball rolling.

Let’s go back about 6 months ago. Our business team was looking for a new publishing system to manage content updates to the Microsoft.com homepage. This publishing system needed to support sharing page content and layouts between multiple sites and locales. The driver behind the initiative was the projected cost savings derived by significantly reducing duplicated content, duplicated localization efforts for that content, and the hours it took to produce a final home page for Microsoft.com.

When our business team came to the Microsoft.com technology team with these requirements, they also came to us with a date that they wanted these requirements implemented. Conversations were initiated on what the best approach should be - adopting an existing publishing platform or creating a new one. It was agreed that the PM’s and Group Leads in the technology team would go research both the best short term and long term approach with the goal being finding the best, most cost-effective solution.

To kick off the project, Program Management spearheaded research into and discussions on a number of internal tools and retail products. We have an amazing depth of knowledge across the Microsoft.com team and across the company, and one of our jobs as program managers is to tap that knowledge effectively. These early exercises included consulting with our developers, testers, and operations staff who knew the current Microsoft.com system and had expertise on what we needed going forward (permissions model, content management, workflow, etc), as well as interacting with the MOSS team and other product group contributors.

The best design and planning in the world falls short if the bits can’t be installed and run in our production environment and service our end users. To facilitate this we consciously engaged very early in the project with our Operations and Release Management teams. It was critical for us to know where and how these cool new bits would be deployed. To ensure that our proposed solution would be able to support our external customer needs, we gathered information related to current and projected site traffic. Additionally, we considered our infrastructure knowing the bits would most likely be installed on a shared environment.

As microsoft.com is the 4^th largest website in the world, we were looking for a solution that supported most of our business needs in a short ship cycle. Other factors included Microsoft-wide long term strategic initiatives, and “dogfooding” our company’s products.

Almost immediately the entire team threw out the idea of building a new publishing system from the ground up. We knew such an approach would have a high resource and time cost, and thus not satisfy the customer.

We also identified the areas where some of the solutions that our team was already leveraging were not extensible or were incomplete. It would also take some time to “stitch” together a solution that might or might not be supported long term. The obvious winner was MOSS 2007 (Microsoft Office SharePoint Services) for the following reasons:

Good Strategic bet - MOSS is a Microsoft technology, built on Microsoft technologies
Integrated content management store (SQL 2005 + CMS)
Integrated authentication & authorization for content authors (Windows Server 2003, SharePoint, Active Directory)
Integrated workflow & business logic engine (Windows Workflow Foundation)
Integrated template / content-typing support (SharePoint)

From this investigation, we partnered with the MOSS team to enable our scenarios.

At this time we broke out into feature groups with one PM assigned to a feature area. We had one PM responsible for schedule, to keep everyone on task and informed on what was being delivered when and communicating feature tradeoff’s when appropriate. We had several other program managers who ensured that as we developed our 1.0 product that all key scenarios were addressed.

To align with the tight ship schedule we simultaneously altered our Software development methodology. We settled with a hybrid model of waterfall and SCRUM. It became every PM’s responsibility to communicate this methodology, ensuring Dev and Test were on board, and keeping business in the loop. As the project matured, a rhythm was established where the PM’s owned communication out to the business on what features were being developed, enabling business review of those features, managing communication up to upper management on the status of the project, ensuring the Developers and testers were getting the information they needed, and ensuring that everything stayed on schedule.

Throughout this experience the PM team lived the adage “we herd cats.”

It’s Nov 1st and test sign off is right around the corner for the 1.0 version of our MOSS enabled platform. Stay tuned for its launch Nov. 7^th at www.microsoft.com. It goes without saying that our entire team has learned from this experience. We’re all super excited about integrating SQL 2005, and MOSS 2007 into the 4^th largest website in the world.

This is a huge win for our customers as well as for Microsoft.

For more information on MOSS 2007:

http://www.microsoft.com/office/preview/servers/sharepointserver/highlights.mspx

More information on Agile Project Management with SCRUM:

http://www.microsoft.com/MSPress/books/6916.asp

Microsoft.com Beta:

http://labs.microsoft.com/en/us/default.aspx

Posted by MSCOM | 0 Comments

Wednesday, October 25, 2006 1:17 PM

Yes, We Recycle!...That Includes AppPools

And I am not just talking about Aluminum and Polystyrene. Application Pool recycling is one great feature of IIS6 but how do you know if it is helping you or hurting you? It’s great for availability and reliability but it can also be masking real problems with code running in your applications. If you are recycling too quickly based on memory limits you may have a memory leak that should be looked at, if you are recycling due to ping fails or other errors, you may have a more serious issue to look at.

We monitor our IIS worker process uptime to measure general health of our applications and make capacity planning decisions (how many apps per app pool, etc). Process startup and .Net framework initialization does have a performance cost, If you are recycling too frequently, this can affect your web servers throughput. We’ve had our share of misbehaving applications that have recycled as frequently as every 5 minutes or just flat out crashed the worker process. You can set Application pools to recycle based on time, number of requests, virtual and private memory limits. Application pools may also recycle due to health reasons, Ping Fail, Fatal Communications Error, etc., or there’s always the admin recycle or the config change recycle. We have many applications (sometimes hundreds) spread out across several app pools. We use our WP uptime data to check if we have a misbehaving application out there (maybe someone published some bad code) or make decisions on how many app pools we can run, which applications to put into separate app pools and how many applications an App Pool should have.

How do you verify and monitor that is really happening?

Well, the first thing you have to do is log all recycle events before you can start collecting and analyzing the data. To enable recycle event logging, if you haven’t already:

cscript adsutil.vbs Set w3svc/AppPools/LogEventOnRecycle 255

This enables all recycle events for all AppPools to be logged in the system event log.

More information on Application Pool recycling events and logging is available in the following KB article:

332088 How to modify Application Pool Recycling events in IIS 6.0

http://support.microsoft.com/default.aspx?scid=kb;EN-US;332088

(note: there may be an error in the above article, I use event ID 1117 for private memory)

Once you have logging enabled you can start collecting this info and creating baselines for your application pools. One of our favorite tools, Log Parser, is great for collecting this data.

For more information on Log Parser:

http://www.microsoft.com/technet/scriptcenter/tools/logparser/default.mspx

You can use a simple log parser query to get the recycle events:

Logparser "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime from \\SERVERNAME\System where SourceName = 'W3SVC' and EventID in (1009;1010;1011;1074;1077;1078;1079;1080;1117) and Message like '%DefaultAppPool%'"

This query will give all the recycle and process failure events for the application pool named like defaultapppool. I’ve included a few other events in this query that aren’t necessarily recycle events but worker process failure events that do affect the uptime of your worker process.

Here is a list of events that I am interested in:

1009 = Worker process Terminated UnExpectedly

1010 = Worker process failed a Ping

1011 = Worker process suffered a Fatal Comummincations Error

1074 = Recycled based on the defined Time Limit

1077 = Recycled based on the Virtual Mem Limit

1078 = An ISAPI reported unhealthy

1079 = An Admin requested a recycle

1080 = A Config Change required a recycle

1117 = Recycled based on Memory limit

That’s a good start but I want to see why we recycled as well, so this query is a little better:

Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime,

case

EventID

when 1009 then 'UnExpEnd'

when 1010 then 'PingFail'

when 1011 then 'FatalComErr'

when 1074 then 'TimeLimit'

when 1077 then 'VMem'

when 1078 then 'ISAPIUnHealth'

when 1079 then 'Admin'

when 1080 then 'ConfigChange'

when 1117 then 'PMem'

end as Reason

from \\SERVERNAME\System

where SourceName in ('W3SVC';'WAS') and

EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and

Message like '%defaultapppool%' and

TimeGenerated > to_timestamp('01-01-2006 00:01:01','yyyy-MM-dd hh:mm:ss')

Save this as wpuptime.sql and run “logparser file:wpuptime.sql” and this will give you an ouput like:

dateTime Reason

------------------- ------------

09/23/2006 01:18:51 ConfigChange

09/23/2006 01:19:52 VMem

09/23/2006 01:20:04 ConfigChange

09/28/2006 09:28:03 Admin

10/04/2006 18:24:16 ConfigChange

10/05/2006 07:50:02 PingFail

10/05/2006 08:21:26 PingFail

That’s better, I can see event date/time and reason but I still want more info, like what is the average up time for the app pool that I am interested in? What was the shortest recycle time, what was the longest, what are the count of each events? To go further with this, I turn to vbscript and invoke the log parser COM object. To use the log parser COM object you must register the DLL on the machine where you will run the script.

Regsvr32 logparser.dll

Here’s an example script that uses the above query and calculates the info, it doesn’t have much error checking and output isn’t exactly pretty, but I did say it was an example right?

' WPUPTIME.VBS - Queries event logs for W3SVC Recycle events (last 100) and calculates up time stats between events

' Requires LogParser.dll to be registered

' 1009 = TermUnExp

' 1010 = PingFail

' 1011 = FatalComErr

' 1074 = TimeLimit

' 1077 = VMem

' 1078 = ISAPIUnhealthy

' 1079 = Admin

' 1080 = ConfigChange

' 1117 = PMem

'

Dim cConfChg, cAdmin, cISAPI, cVMem, cPMem, cTimeLimit, cFatalComErr, cPingFail, cTermUnExp

Dim myQuery

Dim myInputFormat

Dim MinDate

Dim recordSet

Dim lastD, curD, oldD, firstD

Dim uptime_avg, uptime_mins, uptime_mins_last

Dim reccount

Dim timetotal

Dim MinTime

Dim MaxTime

Dim strReason, strReasonLast

Dim strComputer

Dim strAppPool

Dim szQuery

WScript.Echo

ParseCommandLine()

MinDate = CStr("01-01-2006 00:01:01")

Set myQuery = CreateObject("MSUtil.LogQuery")

Set myInputFormat = CreateObject("MSUtil.LogQuery.EventLogInputFormat")

myInputFormat.direction = "BW"

Call QueryServer(strComputer, strAppPool)

CalcUpTime()

Wscript.Echo strComputer, "," , strAppPool, "Avg Uptime:", uptime_avg, "minutes"

WScript.Echo

WScript.Echo "Count of Recycle Events"

WScript.Echo "VirtualMem:" + Chr(9)+ CStr(cVMem)

WScript.Echo "PrivateMem:" + Chr(9) + CStr(cPMem)

WScript.Echo "TimeLimit:" + Chr(9) + CStr(cTimeLimit)

WScript.Echo "PingFail:" + Chr(9) + CStr(cPingFail)

WScript.Echo "Admin:" + Chr(9)+ Chr(9) + CStr(cAdmin)

WScript.Echo "FatalComErr" + Chr(9) + CStr(cFatalComErr)

WScript.Echo "TermUnExp" + Chr(9) + CStr(cTermUnExp)

WScript.Echo "ConfChg" + Chr(9)+ Chr(9) + CStr(cConfChg)

WScript.Echo "ISAPI" + Chr(9) + Chr(9) + CStr(cISAPI)

WScript.Echo

Set myInputFormat = Nothing

Set myQuery = Nothing

set Locator = Nothing

WScript.Quit(0)

Sub CalcUpTime()

WScript.Echo("RecycleTime Uptime(mins) Reason")

reccount = 0

timetotal = 0

MinTime = 1000

MaxTime = 0

cConfChg = 0

cAdmin = 0

cISAPI = 0

cVMem = 0

cPMem = 0

cTimeLimit = 0

cFatalComErr = 0

cPingFail = 0

cTermUnExp = 0

Do While recordSet.atEnd() <> True

Set record = recordSet.getRecord()

curD = CDate(record.GetValue(0))

strReason= CStr(record.GetValue(1))

If reccount <> 0 then

uptime_mins = DateDiff("n", curD, lastD)

WScript.Echo lastD, " ", uptime_mins, strReasonLast

timetotal = timetotal + uptime_mins

if MinTime > uptime_mins Then

MinTime = uptime_mins

End if

if MaxTime < uptime_mins Then

MaxTime = uptime_mins

End if

Else

firstD = CurD

End IF

Select Case strReason

Case "VMem" cVMem = cVMem + 1

Case "PMem" cPMem = cPMem + 1

Case "Admin" cAdmin = cAdmin + 1

Case "ISAPIUnHealth" cISAPI = cISAPI + 1

Case "TimeLimit" cTimeLimit = cTimeLimit + 1

Case "PingFail" cPingFail = cPingFail + 1

Case "FatalComErr" cFatalComErr = cFatalComErr + 1

Case "UnExpEnd" cTermUnExp = cTermUnExp + 1

Case "ConfigChange" cConfChg = cConfChg + 1

End Select

lastD=curD

reccount = reccount + 1

strReasonLast = strReason

uptime_mins_last = uptime_mins

recordSet.moveNext()

If Err.number <> 0 Then

WScript.Echo "Error in CalcUpTime: ", Err.Description

Err.Clear

Exit Do

End If

Loop

oldD = lastD

uptime_avg = Round(timetotal / (reccount-1))

WScript.Echo ""

WScript.Echo "Time Span: " + CStr(oldD) + " through " + CStr(firstD )

WScript.Echo "MinTime:", MinTime

WScript.Echo "MaxTime:", MaxTime

WScript.Echo ""

End Sub

Sub QueryServer(tmpComp, tmpAppPool)

szQuery = "Select top 100 to_string(TimeGenerated, 'MM/dd/yyyy hh:mm:ss') as dateTime, case EventID when 1077 then 'VMem' when 1117 then 'PMem' when 1079 then 'Admin' when 1078 then 'ISAPIUnHealth' when 1074 then 'TimeLimit' when 1010 then 'PingFail' when 1011 then 'FatalComErr' when 1009 then 'UnExpEnd' when 1080 then 'ConfigChange' end as Reason" + _

" from \\" + tmpComp + "\System where SourceName in ('W3SVC';'WAS') and EventID in (1117;1080;1079;1078;1077;1074;1011;1010;1009) and Message like '%"+tmpAppPool+ _

"%' and TimeGenerated > to_timestamp('"+MinDate+"','yyyy-MM-dd hh:mm:ss')"

WScript.Echo "Querying", tmpComp, tmpAppPool, " events"

WScript.Echo

Set recordSet = myQuery.Execute(szQuery, myInputFormat)

If Err.number <> 0 then

WScript.Echo "Could not execute query in QueryServer: ", Err.Description

End if

End Sub

Sub ParseCommandLine()

Dim vArgs

set vArgs = WScript.Arguments

if vArgs.Count <> 2 then

DisplayUsage()

Else

strComputer = vArgs(0)

strAppPool = vArgs(1)

End if

End Sub

Sub DisplayUsage()

WScript.Echo "Usage: cscript.exe " & WScript.ScriptName & " ServerName AppPoolName" & vbLF & vbLF & _

"Example: " & vbLF & _

WScript.ScriptName & " MyServer DefaultAppPool" & vbLF

WScript.Echo

WScript.Quit(0)

End Sub

The above script takes two parameters to run – the servername and the app pool name to match. The query uses a LIKE comparison, so make sure your app pool name given is unique enough to get correct data. Here is the sample outoput (abbreviated):

RecycleTime Uptime(mins) Reason

10/18/2006 1:49:40 PM 304 PMem

10/18/2006 8:45:56 AM 535 PMem

10/17/2006 11:50:04 PM 592 PMem

10/17/2006 1:58:28 PM 842 PMem

10/16/2006 11:56:41 PM 14973 Admin

…

… (cut for brevity)

…

9/27/2006 9:06:14 AM 14 VMem

9/27/2006 8:52:02 AM 96 VMem

Time Span: 9/27/2006 7:16:08 AM through 10/18/2006 1:49:40 PM

MinTime: 1

MaxTime: 14973

MyServer , MyAppPool Avg Uptime: 309 minutes

Count of Recycle Events

VirtualMem: 81

PrivateMem: 6

TimeLimit: 0

PingFail: 11

Admin: 1

FatalComErr 0

TermUnExp 0

ConfChg 1

ISAPI 0

It prints out the date/time of the event, minutes elapsed since the previous event and the reason. Followed by the time span of the events collected (first/last event used). It gives you minimum, maximum and average uptime of the application pool and a count of the various events the query found.

If you see you are recycling based on your time limits, you probably have a pretty healthy app. Looking at the above output we can see the majority of the recycling is due to memory related limits with a few ping fails. This app pool may be have a memory leak, may be using ASP.Net cache excessively or could just have too much content to be contained in a single app pool process (ASPX content is compiled into DLL’s that consume virtual memory), more investigation may be required.

Posted by MSCOM | 0 Comments

Wednesday, October 11, 2006 10:20 AM

Microsoft.com and its ADFS Implementation

Active Directory Federated Service, ADFS, is a Windows Server 2003 (R2) component that facilitates a trust between two or more organizations that will allow the sharing of multiple resources while maintaining each organization’s ability to manage their own set of users. One of the significant challenges faced when trying to implement any new solution is simply keeping the system available. In our implementation of Active Directory Federation Services (ADFS) this was one of our key focuses in moving forward with the project. The ability to use single sign-on for ADFS aware applications is a huge benefit, however it can become a painful burden if the service is unreliable.

Two main areas of concern that we focused on were load balancing and policy file changes. For load balancing we looked at the challenge at both a regional and local level. Initially in production we will use global load balancing from Akamai or Savvis for the front-end web server clusters in two regions. This will ensure availability through regional issues and fail-over is automated based on the health-checking services provided by Akamai and Savvis. Additionally by going this route we have the capability of adding more clusters in the future without much difficulty.

At the regional level we have paired up the servers for local failover through NLB clustering. We are not using any special load-balancing features so in reality this could be accomplished with hardware as well. However, as with a number of scenarios here at Microsoft we are simply using NLB due to the cost savings. Overall this configuration will give us the necessary stability to ensure that the system will remain available with greater than 99.9% up time.

Another challenge that we face is ensuring that the policy file, which is really the backbone for ADFS, is correctly distributed throughout our environment. To solve this we are leveraging another built-in feature of Windows Server 2003 (R2) - Distributed File System -Replication (DFS-R). On each of the backend servers we have enabled a DFS-R group membership with a 24 hour, full mesh replication. Simply put, no matter where the change to the policy file happens, it will be distributed to all servers. As long as we control who can change the file, we have a stable and highly available service.

If you would like to read more on either of the Windows Server 2003 (R2) services we are leveraging please find the following links.

ADFS: http://www.microsoft.com/WindowsServer2003/R2/Identity_Management/ADFSwhitepaper.mspx

DFS-R:

http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx

Posted by MSCOM | 0 Comments

Tuesday, October 10, 2006 2:38 PM

A Little VBS Script Saves the Day

A couple of years ago, on January 28th, 2004 to be exact, I was working another late night at the office when I got a visit from the General Manager of Microsoft.com. He was walking the halls desperately seeking someone to quickly develop a tool/utility to give to customers to help them determine if their personal computer was infected with a new virus. It was about 8 p.m., and there were very few people still at work that night. But I was, and agreed to take a crack at developing this tool. The Mydoom.B virus was running wild, fresh and free around the world that night, and I was going to help hunt it down.

I took me about two hours to deliver a solution. Following I describe what I did, how I did it, and my thinking at the time.

The goal of the tool would be to detect the presence of the virus by finding a specific file anywhere under a client machine’s Windows system folder. The first challenge was to decide what form the tool should take. What kind of executable, written in what language would be simple, quick, and portable?

Even though I was already in love with C# by then, I decided a VBS script file, written with Visual Basic Scripting Edition (VBScript), would be good for this. I was no scripting god, but I had done enough of it by then to make it feasible. The VBS file would be a small, portable, native Windows executable file with no support files and no installation required. The VBScript code could be readily ported to running within a web page, or even ported to a little Visual Basic (VB) application, if need be. In addition, VB and its dialects were very popular, making it easy for other developers to work with the code I was going to write, if required.

But how fast and simply could I get this rush job done?

The core function of this tool would be code that finds a given file name (“ctfmon.dll”) residing in any folder in and under the Windows system folder. The code would have to locate the system folder, and then search on the files in there. For many software developers, coding this sort of file hunt is a straightforward task.

Code that traverses an entire hierarchical folder tree, looking for a specific file name, typically employs recursion. The recursive algorithm will get a list of all of the files in a folder, operating on each of them. Then it gets a list of the subfolders, and calls itself to execute on each of those. If there are no subfolders found under a particular folder, that recursion “drill-down” branch terminates.

I could have gone this route, working out the code and carefully testing it. But instead, I thought it would be faster and safer to leverage a ready-made, built-in Windows operating system command: DIR! Yes, I am talking about the ancient DOS command, still in popular use today, that produces file and directory listings. With the simple inclusion of its “/s” switch, DIR will cover all subfolders – no fancy recursion required!

So I broke out my two DOS reference books from my bookshelf:

Running MS-DOS: Version 6.22

by Van Wolverton, August 1994

MS-DOS 6 Companion

by Joanne Woodcock, May 1993

and got to work. It was time to cook up a little “old school” magic.

Quickly came the challenge of working out how to invoke the DIR command from the VBScript code, and examining the output to see if it found any files. I also had to locate the Windows system folder and get the DIR command to operate on it. Here is the actual code I wrote for this task:

set fso = CreateObject("Scripting.FileSystemObject")

set wso = CreateObject("WScript.Shell")

set wse = wso.Environment("SYSTEM")

sTempDir = wse("TEMP")

sTempDir = wso.ExpandEnvironmentStrings(sTempDir)

sTempFileName = fso.GetTempName()

sTempFullPath = sTempDir & "\" & sTempFileName

sWinDir = wse("WINDIR")

sWinDir = wso.ExpandEnvironmentStrings(sWinDir)

wso.CurrentDirectory = sWinDir

sCmd = "cmd /c dir ctfmon.dll /b /s > " & chr(34) & sTempFullPath & chr(34)

wso.Run sCmd, 0, true

set ts = fso.OpenTextFile(sTempFullPath, 1)

on error resume next

sFileContents = ts.ReadAll

on error goto 0

if len(sFileContents) > 0 then

msgbox "Virus was found at:" & chr(10) & chr(10) & sFileContents

else

msgbox "Virus was not found"

end if

ts.Close

set ts = Nothing

set wse = Nothing

set wso = Nothing

set fso = Nothing

I hope this code looks pretty short to you, considering what it accomplishes. Less code, less chance of flaws in the logic. Let’s review this run of VBScript. This code:

sTempDir = wse("TEMP")

sTempDir = wso.ExpandEnvironmentStrings(sTempDir)

sTempFileName = fso.GetTempName()

sTempFullPath = sTempDir & "\" & sTempFileName

sWinDir = wse("WINDIR")

sWinDir = wso.ExpandEnvironmentStrings(sWinDir)

employs the Scripting.FileSystemObject and WScript.Shell objects, to assemble a location for writing a temporary working file, and to locate the Windows system folder. Coming up with this one statement:

wso.CurrentDirectory = sWinDir

took a big portion of the two hours, believe it or not. I was stumped for a while on how to get the DIR command to execute on a specified folder, and not the folder where the VBS file was residing. That code statement did the trick. The next two lines “do the heavy-lifting” in this little solution:

sCmd = "cmd /c dir ctfmon.dll /b /s > " & chr(34) & sTempFullPath & chr(34)

wso.Run sCmd, 0, true

This is the formulation and execution of the full DIR command, passing several arguments, such as the name of the file the search for, and where to direct the output. For reasons that escape me, I had to prefix the DIR command with a CMD command to invoke the OS command shell. The “/b” switch asks for bare output – just the names of the files with no dates or sizes. The second argument on the wso.Run method – the “0” – hides the command shell UI.

The next run of code:

set ts = fso.OpenTextFile(sTempFullPath, 1)

on error resume next

sFileContents = ts.ReadAll

on error goto 0

reads the contents of the DIR output file into a program variable. The error suppression comes into play when none of the target files are found, and the output file is empty. The final functional run:

if len(sFileContents) > 0 then

msgbox "Virus was found at:" & chr(10) & chr(10) & sFileContents

else

msgbox "Virus was not found"

end if

tells the user if the virus was detected, accomplishing the goal of the tool.

Crude, but effective! I don’t code VBScript much anymore, but I am ready at moments' notice. Around here, the more languages and programming tricks you know the better equipped you are for putting out the fires and preventing them. Keeping Microsoft.com up takes everything you got.

Posted by MSCOM | 1 Comments

Tuesday, October 03, 2006 9:08 AM

Why Is That Elephant In the Room?...View From The Top

This is the latest in the View From the Top series of blog post written by the management of Microsoft.com Operations and Portal Team. This contribution is from Todd Weeks, Sr. Director.

Have you ever had a manager tell you “Work Smarter”, and then be really frustrated or even offended at the fact that it implied that you weren’t? Well, as a manager, I have had the overwhelming desire to actually use this phrase, but instead of just saying it and frustrating or offending, I’ve decided to add a little context to it. As I have boiled it down, there are a couple of fairly universal things that almost always impact us “Working Smarter”.

The first thing has to do with the title of this post. For many reasons, in almost all projects, teams are letting some of the hard questions/concerns go unattended. But as you probably all have noticed, the longer you let a lagging issue be a lagging issue, the more disruptive it becomes to a project. What is tough about finally addressing the “Elephant in the Room” (meaning, the issue nobody seems to want to talk about but everyone knows is there), is that it is most likely going to cause conflict, and usually people tend to want to get their jobs done without conflict. The way that this ties into the fact that you are now not “Working Smarter” is that if you don’t address the issue, everyone will not be on the same page and heading in the exact same direction. When there is a lack of agreement or understanding, people still do work, code is still written, milestones are still checked off; but will that work all need to be re-done to get us back on track when we finally do decide to address the issue? Usually the longer your teams avoid addressing large issues, the more re-work/additional work required to come together. We all have full workloads, but by knowingly avoiding issues everyone sees are there, we are knowingly adding work to our plates for that project which has absolutely no value. It actually has negative value because you will need to do more work to come to the same goal eventually.

So, how do we bridge this social gap and begin to inspire people to address conflict more easily? There are many tools out there today for people to use, the one we are trying throughout the team is called a SWOT (Strengths, Weaknesses, Opportunities, Threats) analysis. While many project teams use SWOTs to look at their projects in the initial phase, we are going to use them as the monthly or milestone report structure for all our projects. What we are looking to do is have the “issues” addressed sooner by doing SWOTs more frequently than normal.

What is great with this process is that the issue can come up in many forms in the SWOT. Perhaps fixing the issue is an “Opportunity”, now it can be broached in a more positive light, and possibly avoiding conflict. But an issue may also come up as a “Threat” or “Weakness” too, and bringing threats, weaknesses and opportunities up as part of the normal process helps to break down some of the social barriers that might stop people from bringing the issues up.

The final piece that can’t be forgotten when using a new processes or tool like SWOT is: reward the behavior of finding or bringing up these issues in such a way that they resolve the issues, but more importantly resolve the issues without tension. If people on a team are seeing potential problems and getting them addressed and solved before they cause more work, they just helped everyone on that project “Work Smarter”. And as you reward people and look to highlight the greater impact on a team they may have, the SWOT is a great vehicle to track where the ideas came from and their impacts. Now people are not just getting their job done and inspired to approach hard issues, they are looking out and helping others avoid unnecessary work so they too can get their jobs done. Make that behavior core to your reward systems and you will see the culture of your team change, and people will “Work Smarter”.

The second thing I quickly wanted to touch on that drives me to want to say “Work Smarter”, is hearing people say, “that’s not my problem” or pushing back on something someone is asking of them. Now you just can’t take on everything, but the way you address someone asking you about work that isn’t your deliverable can make all the difference. The small amount of time it takes to pay attention and get that person directed to the right person, may be a huge time savings. Taking just a few moments to ask yourself, (even though this isn’t your deliverable), “Can I help?”, might save hours. Who hasn’t seen those frustrating mail strings where people debate who should do something? And when you look at the amount of time put into that mail string, it was often more time than it would have taken to have just do the work.

The goal of the entire team, working as a unit, should be to get work on and off its plate efficiently as a group. We don’t want to have a culture of randomization where we are just looking for ways to solve quick small problems that aren’t ours, but on a case by case basis, see if a bit of your time might actually go a long ways to saving not only your time in the future, but others’ as well. And as a bonus: getting something done so it isn’t out there hanging on the group’s “to-do” list. When it comes down to it, taking the time to just listen usually only takes a minute or two, and more importantly it reinforces behavior that should be aspired to for a team where people are willing to ask questions and be open with one another.

As a manager, you should be looking to say to your team, “Work Smarter”. For me, I want to be prescriptive when I say it so it can have the most impact and achieve the desired result. And I know that if I am going to ask, having tools in mind like a SWOT analysis, and then reinforcing the behavior with our rewards systems goes a long way towards our group “Working Smarter” and will help me as a manager not just randomize my team by letting 200 people try and each figure out “the goal.”

Posted by MSCOM | 1 Comments

Wednesday, September 27, 2006 2:21 PM

Add a little Development, mix in some Security, a dash of Program Management, apply a liberal amount of IIS and SQL…Part of the Recipe for MSCOM Operations’ Blog

Readers of this blog might have noticed (or been puzzled by) the variety of subject matter that we present. We have had blog posts ranging from “Why Dogfood?” to “Scaling Your Windows…and other TCP/IP Enhancements” to “Where Oh Where Did All of the Microsoft.com SQL Clusters Go?” So what drives the topics for this blog going forward?

Very recently we had a re-organization that added to our traditional Operations Engineering charter. We are now the Micosoft.com Operations and Portal Team. This new realignment coupled the traditional role of Microsoft.com Operations with a Business Infrastructure team which includes the Lab Hosting Team, Business Management, Services Management and Release Management; and the Portal Development Team which includes Program Management, Development and Test.

MSCOM Operations (the System Engineers that provide enterprise engineering) have worked with these teams for a long time. These are the functional teams that it takes to develop software, get it out into production and run it effectively and efficiently. This new tightly aligned structure allows us to take this powerful engine to a new level of performance and maturity.

The goal of this blog is still to provide IT Professionals with information that they can use. Each of several sub-teams have specific roles and responsibilities in this enterprise. Each of these teams is responsible for producing a blog that correlates to their specific area on a rotating basis. Simple math here: there are twelve of these sub-teams, which equates to one blog post every twelve weeks from each team.

Does that mean that if you read a great IIS blog from us you will have to wait for 12 weeks before that topic comes up again? Not necessarily, we encourage all of our folks to submit blogs anytime they want. Do they all deliver on time? I wish I could say that they did, those teams do still have their “day jobs” as the first priority. Despite having to actually work (imagine that), these folks regularly are providing some timely information out to the IT Pro community.

Here are the functional teams that will provide content for this blog site:

Evangelism – where we get to coordinate this blog and other customer facing interactions, web casts, articles, white papers and customer engagements.

MSCOM Ops Management – what we call “View From the Top”, that gives MSCOM management a blog forum to address various management related subjects.

Web Engineering – written by the web engineers that actually run the MSCOM IIS servers and our internet facing web environment.

SQL Engineering ‑ written by the SQL engineers that actually run the MSCOM SQL servers and our backend environment.

Debug – a team of senior engineers that specialize in advanced troubleshooting techniques and a deep knowledge of the MSCOM Platform from the Windows Server OS level through the application level.

Program Management – an essential team that plays an integral role in our development of the next iteration of the MSCOM portal. These folks are tasked with a variety of deliverables including (but not limited to) writing specifications, keeping the projects on track and ensuring that project status is properly communicated.

Development –these folks are writing the code that will constitute the afore mentioned next iteration of the MSCOM portal among other things.

Test – the Developers best friend, folks that work to ensure the new code is bug free when it hits the web.

Release Management – this team has worked closely with MSCOM Ops for years, as an extension of the product development team efforts. They have recently been re-organized into the central services organization that oversees policy, process and business management for both MSCOM Ops and the MSCOM Portal product development team. This team is responsible for owning the Release Criteria and ensuring product releases deploy smoothly with minimal customer impact into the various Ops managed environments.

Service Management ‑ responsible for on-boarding new customers to MSCOM, working with existing customers to provide guidance and smooth the way for releases.

Security/Architecture ‑ the folks that are keeping the hackers at bay, hardening our infrastructure and providing architectural guidance.

Tools – we are fortunate to have a dedicated Tools team, a talented group of developers that provides us with custom applications that help us monitor, report on and manage all of our environments.

If you have a topic you are interested in send us an email mscomblg@microsoft.com.

Posted by MSCOM | 1 Comments

Monday, September 25, 2006 2:25 PM

MSCOM Operations Presents At DRJ Conference

Recently Sunjeev Pandey and Paul Wright presented Microsoft.com Operations’ approach to resilience, availability, and DR at the Disaster Recovery Journal’s DRJ Conference in San Diego. They had to make some changes in the presentation last minute and promised to post the latest deck on our Blog. So, without further ado, here is the link to download the presentation.

Please feel free to post any questions here and we’ll answer them as best we can.

Posted by MSCOM | 0 Comments

Microsoft.com Operations

This Blog

Syndication

Search

Tags

News

Archives

Application Performance Testing at MSCOM

Why is the security team still laughing at your functional spec?

Keeping Track of Database Capacity -- Monitoring and Planning

Is anyone watching the health of your Multi-tier Web Application?

Scrum De-Mystified, Not So Bad After All

Tell me how I’m measured…and I’ll behave accordingly

Sunset…Do You Know When Prune the Application Tree?

Windows Live ID Adoption Solution in Microsoft.com

We Herd Cats

Yes, We Recycle!...That Includes AppPools

Microsoft.com and its ADFS Implementation

A Little VBS Script Saves the Day

Why Is That Elephant In the Room?...View From The Top

Add a little Development, mix in some Security, a dash of Program Management, apply a liberal amount of IIS and SQL…Part of the Recipe for MSCOM Operations’ Blog

MSCOM Operations Presents At DRJ Conference