Archive for the ‘wiki’ Category

Wiki Explorator Up and Runnin’

Wednesday, January 14th, 2009

A new year comes with the promise of a fresh start. Though we’re no experts at keeping those darn resolutions for long, we managed to get our online service for the analysis of wikis up and runnin’. And that’s a good way to call in the new year, don’t you think? Without further ado, we present to you, our new Wiki Explorator.

A final note of caution before you’re off to play. The service is in its beta stage, which basically means that some things are likely to go awry. Therefore, we appreciate any feedback, whether it’s a bug report or a hail mary. Moreover, the service is in German only for now, but the very report you get is in English. Go figure.

Wikis in the Flash

Tuesday, November 4th, 2008

We’ve been deep into network analysis with our wikis. We’ve turned and twisted hyperlink networks, coauthorship networks, and collaboration networks. We’ve put every measure and every methods out of the network analysis toolbox to good use.

But now we have a new girlfriend, and her name is SoNIA.

She puts our wikis back in the flash (literally, and no typo). Check out the below network.

Each node represents a user in a wiki, and each directed edge represents a response from one user to another on at least one page they both have been working on. The responses are timed within a weekly frame, which you can follow on the top left corner where slice indicates the respective week. Moreover, nodes are colored according to the following schema:

  • White. User is not yet registered in the wiki (strictly speaking, she doesn’t exist but yet, at least not in the network)
  • Orange. User just registered but hasn’t edited any pages, so far
  • Green. User is actively editing pages
  • Red. User is inactive, not editing pages anymore

Don’t you wish you had a girlfriend this pretty?

Worst Practices for Managing Wikis in Organizations

Tuesday, July 22nd, 2008

Following best practices is the royal road to success for a lot of today’s companies. The basic idea behind this approach to management is of course to learn from the best in the business and thus to avoid all the trouble of learning from your own mistakes. No need to get your hands dirty, right?

Unfortunately, the world is just a little more complex than that. Others’ best practices may well work for them, but there is no guarantee that they work for you. On the contrary, it is more likely that they don’t work for you at all. Reasons are easy to come by: Different products, different markets, different organizations, and so on, and so forth.

Instead of doing what others do best, try avoiding what others do worst. It is much more harder to follow best practices than to stay away from worst practices. With respect to managing wikis in organizations, then, here are the top 3 worst practices:

Employee Motivation. Managers keep asking me how they can properly motivate their employees to work with the wiki. Some already tried giving away incentives for edits (”Ten edits for a free cup of coffee, a thousand for a trip to Vegas!”), others make writing a part of employees’ workload (”A thousand edits by the end of the year and you’ll be fine!”). I always answer them with a counter question: Why do you have to motivate your employees at all? My point is to give your employees the benefit of the doubt. They don’t need extra motivation to work with the wiki as long as they see a benefit in using it.

The Purpose. What’s the purpose of the wiki? There is no more central question that managers have to answer before they go about telling someone from IT to install a wiki for all other employees (and not just the nerds in IT who had a perfectly running wiki for years). So, what is the purpose of your wiki? Say it in one sentence, even one word, I dare you. If you cannot answer in simple words such as “software documentation,” “to-do lists,” or “project notes,” then you may need to rethink you mission statement (you have a mission statement, don’t you?). The purpose of the wiki directly translates into the benefit that, ultimately, employees see in using the wiki.

Management. Yes, that’s right, management is the worst practice of them all. A wiki is a highly democratic medium. There is no easier way of killing it than imposing layers and layers of management upon it. Avoid managing your employees’ interest by making them “page patrons” or the like (so that each employee is responsible for some particular page or pages). If they are experts in some field, they will want to pass on their knowledge in one way or another. And if you can’t get some of them to put down what they know on a wiki page, well, maybe then a wiki is not the right way to do it (try making an interview-type podcast, most experts just love to talk about their work). Instead of putting up roles and rules, think of irritating a wiki by introducing a code of conduct (but be sure to put it in the wiki itself).

These are certainly the worst practices for managing wikis in organizations that we came across in our research. There are others, of course. And maybe even best practices, who knows.

Lorenz, Gini, and Pareto

Friday, July 4th, 2008

The Lorenz curve is a graphical representation of social inequality of some sort. For example, economists use it to display the income distribution for households. The Gini coefficient further condenses the Lorenz curve into a single measure. And last in line is the Pareto principle, sometimes referred to as the 80/20 rule, which is a frequent observation of a certain distribution of of social inequality.

All of these well-known concepts easily apply to research on corporate and public wikis. A first shot is to plot the cummulative distribution of revisions against users to gain an insight into the inequality of work done in wikis. Of course, whether or not revisions are a good measure for work is arguable, not the least because the quality of revisions is a completely different issue.

At any rate, Jimmy Wales claims that the top 2% of all Wikipedians account for more than 80% of the work in total. Not so much Pareto, obviously. Still, his claim is rather well informed, as our below plot confirms.

Lorenz Curves of Corporate Wikis and Wikipedia

What’s more interesting is that corporate wikis very well display the 80-20 rule. So there you go, another hint that corporate wikis don’t obey the laws of the public sphere where people come and go as they please. After all, membership in an organization more or less establishes a firm user base of corporate wikis, especially if there is no other means to and end. The silver bullet for making your wiki a success is thus to give your employees no other choice of media.

Web 2.0: An Empirical Account

Monday, June 9th, 2008

Web 2.0 is little more than a buzzword. Then again, there are applications and services out on the Internet that defer the logic of the old world. I recently co-edited a book with Paul Alpar which addresses the Web 2.0 from an empirical perspective.

Web 2.0

There are four chapters on wikis. Klaus Stein and Claudia Hess discuss reputation as a mechanism which excellent articles in Wikipedia strive from. Anja Ebersbach, Knut Krimmel, and Alexander Warta derive several measures for the analysis of corporate wikis. Claudia Müller takes on wikis in terms of knowledge management. And last but not least, I scale up from communication to collaboration to better understand corporate wikis.

The table of content reveals a little more: The first part of the book focuses on weblogs, part two takes on wikis, part three deals with social network sites, and part four with social news. Be sure to check it out.

How much of Wikipedia is in your Wiki?

Friday, September 14th, 2007

Take Wikipedia, get rid of all its help, user, and discussion pages — and your basically left with articles, definitions, and stubs. That’s the core of Wikipedia, the encyclopedia we all love (and some of us hate, mainly for plagiarism by students). Now, take a guess of how many percent all the articles make up of the core. My gut feeling was a lot, say, 90+ percent or so, which isn’t that far off. According to Voss (2005, p. 6), it’s somewhere between 90 and 95 percent (for the German Wikipedia), depending on what counts as a stub. So far, so good.

But Wikipedia is open to the public, that is, to most of the Internet users worldwide (’cept China, e.g.). That’s certainly different for any corporate wiki. An interesting question is thus, just How much of Wikipedia is in your wiki? Or, in other words, What is the percentage of articles in corporate wikis?

In our research project, we have access to a couple of corporate wikis. The data keeps rollin’ in and among the first things I did was to strip one of the wikis off all help, user, and discussion pages. I was left with a little more than 700 pages in total. Far smaller than the claimed two million articles of the English Wikipedia, of course, yet a manageable size to apply some genre analysis.

So, 700+ pages of qualitative data analysis later, I took the below screenshot (yes, screenshot, Graphviz wouldn’t render the PNG in time, whereas the DOT itself only took a couple of minutes).

Corporate Wiki Network

Nodes are pages and edges are links between pages. The network is pretty dense (I’ll calculate some measures later on), spotting somewhat more than 2.500 links among the 700+ pages. The red nodes are articles as one would expect them to find in Wikipedia, that is, they are pieces of writing covering a particular topic, they are structured in a particular manner (e.g., introduction, body, conclusion), and they are authored by members of the organization. All in all there are 68 articles out of the 700+ pages. That’s a little less than 10 percent — and quite a different picture from Wikipedia.

To be honest, a little less than another third of all the pages in this corporate wiki are actually articles, too. But they are more or less copy & paste works from already published articles. I decided to name them features just to distinguish them from articles by organizational members. Features serve the same function as articles, that is, they cover a particular topic, and so on. However, in terms of wiki functionality, they are simply mirrors of someone else’s work outside the membership of the organization. Features are of lesser interest since they don’t come about the (cooperative) authoring of organizational members. Still, articles and features taken together barely make up 40 percent of the corporate wiki!?

Back to the articles that are actually of interest. Take a closer look at the following subnetwork of only articles and other pages the articles link to or are linked from.

Subnetwork of Articles

The articles are fairly well linked among the other pages. However, among the articles themselves, there are hardly any links at all (it’s hard to see, but I colored the inter-article links in red, too). This suggests that the articles cover substantially different topics — and looking at the articles themselves reveals this assumption to be just right.

These first findings suggest a couple of additional questions, for example, What genres do we find in corporate wikis other than articles and the expected definitions and stubs?, Are those other genres also found in other corporate communication media (e.g., in coroporate blogs, document management systems, etc.)?, and, if not, Are those other genres in any way innovative means for the organization?

I’ll follow up on those question in a later post ;-)

Quantitative and Qualitative Data Analysis of Weblogs and Wikis - An Easy-Bake Recipe.

Monday, August 13th, 2007

Most social scientists shy away from command-line tools. They live in a world of drag & drop, just hoping for the right tool to get their dirty work done. Aside from the unfortunate fact that data is hard to come by in the first place, it is also far too messy (and far too interesting, I must add) to be boiled and cooked with your standard all-in-one tool. Just consider the unlimited variety of weblogs and wikis. Yes, they all share a basic structure, chronologically reversed postings in case of weblogs, for example. But all in all a quantitative analysis of these websites takes a considerable amount of coding, that is, dirty work. The tools that claim to do the analysis all by themselves are a little bit like fast food, there is only a limited variety and everything tastes the same.

Now, if you live on the lighter side of live, you might enjoy a good home-cooked meal once in a while, don’t you? The same goes for work, why settle for the simple statistics you get with standard software when there is so much more to science right under the hood of all those cluttered windows. All you need to do is to put a little faith in the command line and you’re ready to go. Just follow the below three-steps instructions, it’s as easy as making a ham sandwich.

First, there’s the ingredients. The below shopping list is just a suggestion, of course. If you look hard enough, you may well find a good or even better substitute for any of the following tools. A note to the cook, the list is based on the assumption that your stove is a Mac or a Linux machine, although most of the tools work reasonably well on Windows.

With all the ingredients on the table or desktop, respectively, take a look at wget. “GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.” The basic usage is

> wget [option] [URL]

The real beauty of wget is its ability to crawl an entire site and thus mirror it locally. Just turn on the -r option for recursive download and you’ll get all pages off a site without even knowing their URLs. But be careful to restrict the host to get data from, otherwise wget follows all links and you get a lot (and I mean, a LOT) more traffic than you want. So in order to get, say, the top rated Bamblog, you need to call

> wget -r -H -D bamblog.de http://www.bamblog.de

It may take several minutes to get the entire site, obviously depending on the site’s size and the server’s speed that the site is hosted at. If you’re only interest in text analysis, there is an option to exclude the download of picture and other media. See the wget manual for details.

With wget’s job done, you get a local copy of the entire site in a single folder. Inside that folder you’ll likely find several hundred of files, most of them you are not even interested in. With an eye on the content of the site, you want to look for those HTML files that contain the actual postings of a weblog, the pages of a wiki, or the like. For example, Wordpress by default numbers all postings and puts them in an archive folder. Most likely, those files are the ones you want to analyze.

Before the actual analysis, though, you need to boil your data for some time, that is, turn the raw data into something your CAQDAS can manage. This is where textutil or DocFrac comes in. “textutil can be used to manipulate text files of various formats, using the mechanisms provided by the Cocoa text system.” In effect, it converts all HTML files into RTFs, which most CAQDAS can process. On the command line, you pass all files with a .html extension to textutil to convert to .rtf with

> find . -name \*.html -print0 | xargs -0 textutil -convert rtf.

Now all that’s left to do is to load the RTFs into TAMS, Weft, or any other tool and analyze away, that is, code the content and see what the data holds. In case of Jan Schmidt’s Bamblog, you currently get a little more than 600 postings including all comments. That’s a lot of coding to do and you may wish to get some of it done in routine fashion. Well, there is help, at least for the site’s basic structure. Unfortunately, this is also where it gets a little more messy. The simple command line call of any of the above tools it not enough, you need to get yourself familiarized with regular expressions. This will help you to automatically code the structure of the RTF file any which way you need it. For example, postings always start with the title and the date they are posted on. If you search for this particular pattern, say, August 13, 2007 with

> (Januar|Februar|März|April|Mai|Juni|Juli|August|September|Oktober|November|Dezember) ([0-9]+, 200[0-9])

and replace it with

> \'7bdate\'7d$1 $2\'7b\'5cdate\'7d,

then you’ll get your RTF to look something like

> {date}August 13, 2007{/date}.

Do this for any recurring pattern with the code you need and you’ll be able to analyze your data in no time. For this last example, I simply counted all comments (i.e., all occurences of my code ) on each one of the 600+ pages.

Comments per Postings on Bamblog

All in all, the cooking time was a little more than two hours from the intial download of the data to the plot above. The largest part of the prepartion is to figure out the structure of the data and thus what to search for and replace it with. However, once you’ve done that, it may fit other data sources as well, for weblogs frequently use Wordpress and wikis based on MediaWiki are in the thousands. The sky is the limit.

Bon appétit.

Genres of Organizational Communication

Tuesday, July 17th, 2007

More and more organizations employ wikis. The reasons for doing so are plentiful, reaching from hopes of increasing productivity to beliefs in democratic collaboration. Whether or not wikis and other new media meet these hopes and beliefs remains an open question, particularly since wikis have only been around for a relatively short time. Besides the actual shortage on data, there is little genuine research on the topic, indeed.

The criteria with which organizations measure the efficiency and effectiveness of wikis are one way to answer the above question, of course. Yet, from a resarch perspective they are less interesting. What’s more interesting are the unintended and unintentional consequences that come with the use of wikis. For example, how do wikis change existing organizational structures?, does decision making pertain to wiki users?, and in what way do wikis influence work processes?

In our research, we first approach wikis in terms of genre analysis. More specifically, we distinguish several genres of organizational communication as they are produced and reproduced in wikis. For example, meeting minutes are a particular genre which is frequently found in wikis. Interestingly, this genre is produced and reproduced in exactly the way we know meeting minutes from, say, jot-down notes which are later distributed via e-mail; there is no sign of collaborative authorship or any other feature provided by wikis. Nonetheless, we expect to see the innvotion of wikis influence genres over time, much like e-mail changed several genres a little more than a decade ago.

Our next step is to compile a more comprehensive first look at genres of organizational communication. We’ll post the table here. Soon.

We ain’t tellin’.

Sunday, June 24th, 2007

Among the first things to consider in empirical research is to see where you can get your data from. Before sending out a survey, you need to know who to send it to. Knowing you audience is not only key to a good book, presentation, or what have you, but it is certainly a major concern in doing empirical research. In other words, and for our research on Wikis in Organizations in particular, it is crucial to cooperate with partners as early as possible.

Of course, we had our eyes on organizations before the project even officially started. Still, it is only now that we are able to talk to them about the project in a little more detail and actually see them in person, which helps a great deal convincing them to participate in research that they mostly benefit from at a later point in time.

In the first couple of meetings we had with partnering organizations, euphoria about our research is not the problem. The people in charge are genuinely interested in what we do and, of course, what we can do for them. The problem is more on the side of the business as we are met with concerns about data security and privacy issues. And we do understand these concerns. We are not out to track employee use of wikis and report them back to management. Neither are we going to publish any data without anonymizing it beforehand. These issues are part of the non-disclosure agreement that we are signing as well as part of a code of conduct for doing scientific research. And they’re just common sense to follow so that you don’t get into trouble, either with the organization or the scientific community.

Unfortunately, that also means that we’re limitted in this weblog’s first-hand reports on corporate wikis. So far, we’ve seen different usage patterns, different user roles, and much more, very exciting stuff for us as researchers, and probably and hopefully interesting stuff for you, the audience.

Wikis in Organizations.

Wednesday, June 13th, 2007

The University of Bamberg made it official today. Our project is well on its way to research Wikis in Organizations.

Project Team

Over the next two years, the interdisciplinary research team, led by Prof. Dr. Anna Maria Theis-Berglmair (back row, on the very left) and Prof. Dr. Christoph Schlieder (back row, second from the left), ventures into the wonderous workings of corporate wikis, taking a look at how organizations turn an invention into a (hopefully) successful innovation.

This weblog accompanies the research. It reveals insights into ground-breaking theoretical developments (that’s the vision, at least), follows the enthralling adventures in recursive thinking (there’s the mission), and documents everyday life of scientists (and that’s how it all works).

Stay tuned.