Back to Back Champion in Bioinformatics: 2009

Monday, August 17, 2009

This page contains material on, or relating to, conditional random fields. I shall continue to update this page as research on conditional random fields advances, so do check back periodically. If you feel there is something that should be on here but isn't, then please email me (hmw26 -at- srcf.ucam.org) and let me know.

introduction

Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.

tutorial

Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.

papers by year

2001

John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001), 2001.

We present conditional random fields, a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

2002

Hanna Wallach. Efficient Training of Conditional Random Fields. M.Sc. thesis, Division of Informatics, University of Edinburgh, 2002.

This thesis explores a number of parameter estimation techniques for conditional random fields, a recently introduced probabilistic model for labelling and segmenting sequential data. Theoretical and practical disadvantages of the training techniques reported in current literature on CRFs are discussed. We hypothesise that general numerical optimisation techniques result in improved performance over iterative scaling algorithms for training CRFs. Experiments run on a subset of a well-known text chunking data set confirm that this is indeed the case. This is a highly promising result, indicating that such parameter estimation techniques make CRFs a practical and efficient choice for labelling sequential data, as well as a theoretically sound and principled probabilistic framework.

Thomas G. Dietterich. Machine Learning for Sequential Data: A Review. In Structural, Syntactic, and Statistical Pattern Recognition; Lecture Notes in Computer Science, Vol. 2396, T. Caelli (Ed.), pp. 15–30, Springer-Verlag, 2002.

Statistical learning problems in many fields involve sequential data. This paper formalizes the principal learning tasks and describes the methods that have been developed within the machine learning research community for addressing these problems. These methods include sliding window methods, recurrent sliding windows, hidden Markov models, conditional random fields, and graph transformer networks. The paper also discusses some open research issues.

2003

Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. In Proceedings of the 2003 Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL-03), 2003.

Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base noun-phrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximum-entropy models.

Andrew McCallum. Efficiently Inducing Features of Conditional Random Fields. In Proceedings of the 19th Conference in Uncertainty in Articifical Intelligence (UAI-2003), 2003.

Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionally-trained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, non-independent features of the input. Faced with this freedom, however, an important question remains: what features should be used? This paper presents an efficient feature induction method for CRFs. The method is founded on the principle of iteratively constructing feature conjunctions that would significantly increase conditional log-likelihood if added to the model. Automated feature induction enables not only improved accuracy and dramatic reduction in parameter count, but also the use of larger cliques, and more freedom to liberally hypothesize atomic input variables that may be relevant to a task. The method applies to linear-chain CRFs, as well as to more arbitrary CRF structures, such as Relational Markov Networks, where it corresponds to learning clique templates, and can also be understood as supervised structure learning. Experimental results on named entity extraction and noun phrase segmentation tasks are presented.

David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft. Table Extraction Using Conditional Random Fields. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), 2003.

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to e ciently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present di culties for traditional language modeling techniques, however. This paper presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better. We show experimental results on plain-text government statistical reports in which tables are located with 92% F1, and their constituent lines are classified into 12 table-related categories with 94% accuracy. We also discuss future work on undirected graphical models for segmenting columns, finding cells, and classifying them as data cells or label cells.

Andrew McCallum and Wei Li. Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL), 2003.

Wei Li and Andrew McCallum. Rapid Development of Hindi Named Entity Recognition Using Conditional Random Fields and Feature Induction. In ACM Transactions on Asian Language Information Processing (TALIP), 2003.

This paper describes our application of conditional random fields with feature induction to a Hindi named entity recognition task. With only five days development time and little knowledge of this language, we automatically discover relevant features by providing a large array of lexical tests and using feature induction to automatically construct the features that most increase conditional likelihood. In an effort to reduce overfitting, we use a combination of a Gaussian prior and early stopping based on the results of 10-fold cross validation.

Yasemin Altun and Thomas Hofmann. Large Margin Methods for Label Sequence Learning. In Proceedings of 8th European Conference on Speech Communication and Technology (EuroSpeech), 2003.

Label sequence learning is the problem of inferring a state sequence from an observation sequence, where the state sequence may encode a labeling, annotation or segmentation of the sequence. In this paper we give an overview of discriminative methods developed for this problem. Special emphasis is put on large margin methods by generalizing multiclass Support Vector Machines and AdaBoost to the case of label sequences.An experimental evaluation demonstrates the advantages over classical approaches like Hidden Markov Models and the competitiveness with methods like Conditional Random Fields.

Simon Lacoste-Julien. Combining SVM with graphical models for supervised classification: an introduction to Max-Margin Markov Networks. CS281A Project Report, UC Berkeley, 2003.

The goal of this paper is to present a survey of the concepts needed to understand the novel Max-Margin Markov Networks (M³-net) framework, a new formalism invented by Taskar, Guestrin and Koller which combines both the advantages of the graphical models and the Support Vector Machines (SVMs) to solve the problem of multi-label multi-class supervised classification. We will compare generative models, discriminative graphical models and SVMs for this task, introducing the basic concepts at the same time, leading at the end to a presentation of the M³-net paper.

2004

Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton. Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences. Workshop on Syntax, Semantics, Statistics; 16th Annual Conference on Neural Information Processing Systems (NIPS 2003), 2004.

Conditional random fields (CRFs) for sequence modeling have several advantages over joint models such as HMMs, including the ability to relax strong independence assumptions made in those models, and the ability to incorporate arbitrary overlapping features. Previous work has focused on linear-chain CRFs, which correspond to finite-state machines, and have efficient exact inference algorithms. Often, however, we wish to label sequence data in multiple interacting ways—for example, performing part-of-speech tagging and noun phrase segmentation simultaneously, increasing joint accuracy by sharing information between them. We present dynamic conditional random fields (DCRFs), which are CRFs in which each time slice has a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks—and parameters are tied across slices. (They could also be called conditionally-trained Dynamic Markov Networks.) Since exact inference can be intractable in these models, we perform approximate inference using the tree-based reparameterization framework (TRP). We also present empirical results comparing DCRFs with linear-chain CRFs on natural-language data.

Kevin Murphy, Antonio Torralba and William T.F. Freeman. Using the forest to see the trees: a graphical model relating features, objects and scenes. In Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification.

Sanjiv Kumar and Martial Hebert. Discriminative Fields for Modeling Spatial Dependencies in Natural Images. In Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments.

Ben Taskar, Carlos Guestrin and Daphne Koller. Max-Margin Markov Networks. In Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

In typical classification tasks, we seek a function which assigns a label to a single object. Kernel-based approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use high-dimensional feature spaces, and from their strong theoretical guarantees. However, many real-world tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernel-based methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle high-dimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M³) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M³ networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches.

Burr Settles. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. To appear in Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 2004.

A demo of the system can be downloaded here.

As the wealth of biomedical knowledge in the form of literature increases, there is a rising need for effective natural language processing tools to assist in organizing, curating, and retrieving this information. To that end, named entity recognition (the task of identifying words and phrases in free text that belong to certain classes of interest) is an important first step for many of these larger information management goals. In recent years, much attention has been focused on the problem of recognizing gene and protein mentions in biomedical abstracts. This paper presents a framework for simultaneously recognizing occurrences of PROTEIN, DNA, RNA, CELL-LINE, and CELL-TYPE entity classes using Conditional Random Fields with a variety of traditional and novel features. I show that this approach can achieve an overall F measure around 70, which seems to be the current state of the art.

Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), 2004.

In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data.

John Lafferty, Xiaojin Zhu and Yan Liu. Kernel conditional random fields: representation and clique selection. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), 2004.

Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graph-structured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semi-supervised learning algorithms for structured data through the use of graph kernels. The framework and clique selection methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Xuming He, Richard Zemel, and Miguel Á. Carreira-Perpiñán. Multiscale conditional random fields for image labelling. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), 2004.

We propose an approach to include contextual features for labeling images, in which each pixel is assigned to one of a finite set of labels. The features are incorporated into a probabilistic framework which combines the outputs of several components. Components differ in the information they encode. Some focus on the image-label mapping, while others focus solely on patterns within the label field. Components also differ in their scale, as some focus on fine-resolution patterns while others on coarser, more global structure. A supervised version of the contrastive divergence algorithm is applied to learn these features from labeled image data. We demonstrate performance on two real-world image databases and compare it to a classifier and a Markov random field.

Yasemin Altun, Alex J. Smola, Thomas Hofmann. Exponential Families for Conditional Random Fields. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI-2004), 2004.

In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

Michelle L. Gregory and Yasemin Altun. Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech. In Proceedings of the 42^nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004.

The detection of prosodic characteristics is an important aspect of both speech synthesis and speech recognition. Correct placement of pitch accents aids in more natural sounding speech, while automatic detection of accents can contribute to better word-level recognition and better textual understanding. In this paper we investigate probabilistic, contextual, and phonological factors that influence pitch accent placement in natural, conversational speech in a sequence labeling setting. We introduce Conditional Random Fields (CRFs) to pitch accent prediction task in order to incorporate these factors efficiently in a sequence model. We demonstrate the usefulness and the incremental effect of these factors in a sequence model by performing experiments on hand labeled data from the Switchboard Corpus. Our model outperforms the baseline and previous models of pitch accent prediction on the Switchboard Corpus.

Brian Roark, Murat Saraclar, Michael Collins and Mark Johnson. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm. In Proceedings of the 42^nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004.

This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

Ryan McDonald and Fernando Pereira. Identifying Gene and Protein Mentions in Text Using Conditional Random Fields. BioCreative, 2004.

Trausti T. Kristjansson, Aron Culotta, Paul Viola and Andrew McCallum. Interactive Information Extraction with Constrained Conditional Random Fields. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI 2004), 2004.

Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have achieved low error rates but invariably make a number of errors. The goal of an interactive information extraction system is to assist the user in filling in database fields while giving the user confidence in the integrity of the data. The user is presented with an interactive interface that allows both the rapid verification of automatic field assignments and the correction of errors. In cases where there are multiple errors, our system takes into account user corrections, and immediately propagates these constraints such that other fields are often corrected automatically. Linear-chain conditional random fields (CRFs) have been shown to perform well for information extraction and other language modelling tasks due to their ability to capture arbitrary, overlapping features of the input in a Markov model. We apply this framework with two extensions: a constrained Viterbi decoding which finds the optimal field assignments consistent with the fields explicitly specified or corrected by the user; and a mechanism for estimating the confidence of each extracted field, so that low-confidence extractions can be highlighted. Both of these mechanisms are incorporated in a novel user interface for form filling that is intuitive and speeds the entry of data—providing a 23% reduction in error due to automated corrections.

Thomas G. Dietterich, Adam Ashenfelter and Yaroslav Bulatov. Training Conditional Random Fields via Gradient Tree Boosting. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), 2004.

Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible and powerful model for learning to assign labels to elements of sequences in such applications as part-of-speech tagging, text-to-speech mapping, protein and DNA sequence analysis, and information extraction from web pages. However, existing learning algorithms are slow, particularly in problems with large numbers of potential input features. This paper describes a new method for training CRFs by applying Friedman's (1999) gradient tree boosting method. In tree boosting, the CRF potential functions are represented as weighted sums of regression trees. Regression trees are learned by stage-wise optimizations similar to Adaboost, but with the objective of maximizing the conditional likelihood P(Y|X) of the CRF model. By growing regression trees, interactions among features are introduced only as needed, so although the parameter space is potentially immense, the search algorithm does not explicitly consider the large space. As a result, gradient tree boosting scales linearly in the order of the Markov model and in the order of the feature interactions, rather than exponentially like previous algorithms based on iterative scaling and gradient descent.

John Lafferty, Yan Liu and Xiaojin Zhu. Kernel Conditional Random Fields: Representation, Clique Selection, and Semi-Supervised Learning. Technical Report CMU-CS-04-115, Carnegie Mellon University, 2004.

Kernel conditional random fields are introduced as a framework for discriminative modeling of graph-structured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semi-supervised learning algorithms for structured data through the use of graph kernels. The clique selection and semi-supervised methods are demonstrated in synthetic data experiments, and are also applied to the problem of protein secondary structure prediction.

Fuchun Peng and Andrew McCallum (2004). Accurate Information Extraction from Research Papers using Conditional Random Fields. In Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL-04), 2004.

With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This paper employs Conditional Random Fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. This paper makes an empirical exploration of several factors, including variations on Gaussian, exponential and hyperbolic priors for improved regularization, and several classes of features and Markov order. On a standard benchmark data set, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs.

Yasemin Altun, Thomas Hofmann and Alexander J. Smola. Gaussian process classification for segmenting and annotating sequences. In Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), 2004.

Many real-world classification tasks involve the prediction of multiple, inter-dependent class labels. A prototypical case of this sort deals with prediction of a sequence of labels for a sequence of observations. Such problems arise naturally in the context of annotating and segmenting observation sequences. This paper generalizes Gaussian Process classification to predict multiple labels by taking dependencies between neighboring labels into account. Our approach is motivated by the desire to retain rigorous probabilistic semantics, while overcoming limitations of parametric methods like Conditional Random Fields, which exhibit conceptual and computational difficulties in high-dimensional input spaces. Experiments on named entity recognition and pitch accent prediction tasks demonstrate the competitiveness of our approach.

Yasemin Altun and Thomas Hofmann. Gaussian Process Classification for Segmenting and Annotating Sequences. Technical Report CS-04-12, Department of Computer Science, Brown University, 2004.

Multiclass classification refers to the problem of assigning labels to instances where labels belong to some finite set of elements. Often, however, the instances to be labeled do not occur in isolation, but rather in observation sequences. One is then interested in predicting the joint label configuration, i.e. the sequence of labels, using models that take possible interdependencies between label variables into account. This scenario subsumes problems of sequence segmentation and annotation. In this paper, we investigate the use of Gaussian Process (GP) classification for label sequences.

2005

Cristian Smimchisescu, Atul Kanaujia, Zhiguo Li and Dimitris Metaxus. Conditional Models for Contextual Human Motion Recognition. In Proceedings of the International Conference on Computer Vision, (ICCV 2005), Beijing, China, 2005.

We present algorithms for recognizing human motion in monocular video sequences, based on discriminative Conditional Random Field (CRF) and Maximum Entropy Markov Models (MEMM). Existing approaches to this problem typically use generative (joint) structures like the Hidden Markov Model (HMM). Therefore they have to make simplifying, often unrealistic assumptions on the conditional independence of observations given the motion class labels and cannot accommodate overlapping features or long term contextual dependencies in the observation sequence. In contrast, conditional models like the CRFs seamlessly represent contextual dependencies, support efficient, exact inference using dynamic programming, and their parameters can be trained using convex optimization. We introduce conditional graphical models as complementary tools for human motion recognition and present an extensive set of experiments that show how these typically outperform HMMs in classifying not only diverse human activities like walking, jumping, running, picking or dancing, but also for discriminating among subtle motion styles like normal walk and wander walk.

Ariadna Quattoni, Michael Collins and Trevor Darrel. Conditional Random Fields for Object Recognition. In Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.

We present a discriminative part-based approach for the recognition of object classes from unsegmented cluttered scenes. Objects are modeled as flexible constellations of parts conditioned on local observations found by an interest operator. For each object class the probability of a given assignment of parts to local features is modeled by a Conditional Random Field (CRF). We propose an extension of the CRF framework that incorporates hidden variables and combines class conditional CRFs into a unified framework for part-based object recognition. The parameters of the CRF are estimated in a maximum likelihood framework and recognition proceeds by finding the most likely class under our model. The main advantage of the proposed CRF framework is that it allows us to relax the assumption of conditional independence of the observed data (i.e. local features) often used in generative approaches, an assumption that might be too restrictive for a considerable number of object classes. We illustrate the potential of the model in the task of recognizing cars from rear and side views.

Jospeh Bockhorst and Mark Craven. Markov Networks for Detecting Overlapping Elements in Sequence Data. In Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.

Many sequential prediction tasks involve locating instances of pat- terns in sequences. Generative probabilistic language models, such as hidden Markov models (HMMs), have been successfully applied to many of these tasks. A limitation of these models however, is that they cannot naturally handle cases in which pattern instances overlap in arbitrary ways. We present an alternative approach, based on conditional Markov networks, that can naturally represent arbitrarily overlapping elements. We show how to efficiently train and perform inference with these models. Experimental results from a genomics domain show that our models are more accurate at locating instances of overlapping patterns than are baseline models based on HMMs.

Antonio Torralba, Kevin P. Murphy, William T. Freeman. Contextual models for object detection using boosted random fields. In Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.

We seek to both detect and segment objects in images. To exploit both local image data as well as contextual information, we introduce Boosted Random Fields (BRFs), which uses Boosting to learn the graph structure and local evidence of a conditional random field (CRF). The graph structure is learned by assembling graph fragments in an additive model. The connections between individual pixels are not very informative, but by using dense graphs, we can pool information from large regions of the image; dense models also support efficient inference. We show how contextual information from other objects can improve detection performance, both in terms of accuracy and speed, by using a computational cascade. We apply our system to detect stuff and things in office and street scenes.

Sunita Sarawagi and William W. Cohen. Semi-Markov Conditional Random Fields for Information Extraction. In Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.

We describe semi-Markov conditional random fields (semi-CRFs), a conditionally trained version of semi-Markov chains. Intuitively, a semi-CRF on an input sequence x outputs a "segmentation" of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements x_i of x. Importantly, features for semi-CRFs can measure properties of segments, and transitions within a segment can be non-Markovian. In spite of this additional power, exact learning and inference algorithms for semi-CRFs are polynomial-time—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semi-CRFs generally outperform conventional CRFs.

Yuan Qi, Martin Szummer and Thomas P. Minka. Bayesian Conditional Random Fields. To appear in Proceedings of the Tenth International W\orkshop on Artificial Intelligence and Statistics (AISTATS 2005), 2005.

We propose Bayesian Conditional Random Fields (BCRFs) for classifying interdependent and structured data, such as sequences, images or webs. BCRFs are a Bayesian approach to training and inference with conditional random fields, which were previously trained by maximizing likelihood (ML) (Lafferty et al., 2001). Our framework eliminates the problem of overfitting, and offers the full advantages of a Bayesian treatment. Unlike the ML approach, we estimate the posterior distribution of the model parameters during training, and average over this posterior during inference. We apply an extension of EP method, the power EP method, to incorporate the partition function. For algorithmic stability and accuracy, we flatten the approximation structures to avoid two-level approximations. We demonstrate the superior prediction accuracy of BCRFs over conditional random fields trained with ML or MAP on synthetic and real datasets.

Aron Culotta, David Kulp and Andrew McCallum. Gene Prediction with Conditional Random Fields. Technical Report UM-CS-2005-028. University of Massachusetts, Amherst, 2005.

Given a sequence of DNA nucleotide bases, the task of gene prediction is to find subsequences of bases that encode proteins. Reasonable performance on this task has been achieved using generatively trained sequence models, such as hidden Markov models. We propose instead the use of a discriminitively trained sequence model, the conditional random field (CRF). CRFs can naturally incorporate arbitrary, non-independent features of the input without making conditional independence assumptions among the features. This can be particularly important for gene finding, where including evidence from protein databases, EST data, or tiling arrays may improve accuracy. We eval- uate our model on human genomic data, and show that CRFs perform better than HMM-based models at incorporating homology evidence from protein databases, achieving a 10% reduction in base-level errors.

Yang Wang and Qiang Ji. A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), Volume 1, 2005.

This paper presents a dynamic conditional random field (DCRF) model to integrate contextual constraints for object segmentation in image sequences. Spatial and temporal dependencies within the segmentation process are unified by a dynamic probabilistic framework based on the conditional random field (CRF). An efficient approximate filtering algorithm is derived for the DCRF model to recursively estimate the segmentation field from the history of video frames. The segmentation method employs both intensity and motion cues, and it combines dynamic information and spatial interaction of the observed data. Experimental results show that the proposed approach effectively fuses contextual constraints in video sequences and improves the accuracy of object segmentation.

Wednesday, August 5, 2009

Thursday, July 30, 2009

Biological Background คร่าว ๆ เกี่ยวกับการทำงานของ DNA

ในสิ่งมีชีวิตมีเซลล์เป็นส่วนประกอบ

ในเซลล์มีโครโมโซม
ในโครโมโซมประกอบด้วยลำดับนิวคลีโอไทด์(nucleotide)เกลียวคู่(double-helix)พันกันอยู่
ซึ่งนิวคลีโอไทด์มีสารที่เป็นเบสประกอบด้วยบางทีเราอาจจะเรียกนิวคลีโอไทด์ว่าเบสก็ได้

นิวคลีโอไทด์สามารถเป็นได้ 4 อย่างดังต่อไปนี้
Adenine เขียนย่อว่า A
Cytocine เขียนย่อว่า C
Guanine เขียนย่อว่า G
Thymine เขียนย่อว่า T

ในขั้นตอนการถอดรหัส(Transcription)สายดีเอ็นเอ(DNA)จะคลายเกลียวออกเพื่อจะถอดรหัส
เมื่อถอดรหัสแล้วจะได้สิ่งที่เก็บรหัสไว้ซึ่งเรียกว่า pre-mRNA
ในพวกยูคาริโอต(Eukaryote) pre-mRNA จะมีทั้งส่วนที่จะใช้งานและไม่ใช้งาน ส่วนที่จะไม่ใช้งานเดี๋ยวจะมีการตัดทิ้ง
ส่วนที่จะใช้งานหมายถึงส่วนที่จะสามารถนำไปแปลรหัส(Translation)ต่อนั้นจะเรียกว่า exon
ส่วนที่ไม่ใช้ต่อเรียกว่า intron จะถูกตัดทิ้งในขั้นตอนที่เรียกว่า splicing
พอตัดส่วน intron ใน pre-mRNA แล้วจะเหลือส่วน exon
exon ทั้งหมดมาต่อกัน ก็จะได้ mRNA จริง ๆ บางทีก็เรียก mature-mRNA เค้าอ่านว่าแมทชัวเอ็มอาร์เอ็นเอ หรือว่า มาเจอร์เอ็มอาร์เอ็นเอไม่แน่ใจ เราคิดว่า แมทชัว ฟังดูดีกว่า อ่านแบบแอ๊กเซ่นหน่อย ๆ

ได้ mRNA จริง ๆ แล้ว
เวลาจะ translation จะไม่ได้ใช้ทุกส่วนที่เป็น mRNA นะ ใช้แค่ส่วนที่จะใช้ซึ่งเรียกว่า CDS(Coding Sequence)
เฉพาะ CDS นั่นแหละ จะทรานสเลทไปเป็นกรดอะมิโน(amino acid)
ตอนแปลรหัสจะมีไรโบโซม(ribosome)คอยอ่าน mRNA ส่วนที่เป็น CDS โดยการอ่านจะอ่านทีละ 3 เบส
3 เบส = 1 โคดอน(codon) แหม๊ ศัพท์เทคกะนิคเยอะจริง
1 โคดอน แปลรหัสได้ 1 กรดอะมิโน
หลาย ๆ กรดอะมิโนต่อกัน = โปรตีน(Protein)

โปรตีนก็คือเรา เราก็คือก้อนโปรตีน โปรตีนในตัวเรา เช่น เนื้อหนังมังสา ขน ผม เล็บ น้ำย่อย เอนไซม์ ฮอร์โมน สารสื่อประสาท ต่าง ๆ นานา แต่ละอย่างทำหน้าที่ของมันเองอย่างสมดุล ทำให้เราเป็นเราอย่างทุกวันนี้

ถ้าเราเป็นผีบ้า อาจเกิดจากฮอร์โมนบางอย่างทำงานผิดปกติไปเช่นร่างกายอาจสร้างมามากไปหรือน้อยไป ก็ทำให้เราผิดปกติได้

ด้วยประการละฉะนี้
**รูปภาพต่าง ๆ ในหน้านี้เอามาจากในเน็ตทั้งนั้น ขอบคุณเจ้าของรูป(ซึ่งเป็นใครไม่รู้บ้าง)ที่เอื้อเฟื้อ

Wednesday, July 29, 2009

จดโดเมนเนม (Register Domain Name)

เมื่อคืนนอนไม่หลับ ก็เลยเอาคอมมาเล่นอีก ช่วงนี้สนใจอ่านเรื่อง google adsense จริง ๆ ก็ได้ยินมานานแล้ว แต่เห็นว่ามันต้องมีเวบไซต์ของตัวเองก่อน ถึงจะเอาโฆษณาของกูเกิ้ลแอสเซนไปติดได้ ก็อีกนั่นแหละ ไม่รู้ว่าจะทำเว็บอะไร ตอนนี้เรียน Bioinformatics อยู่ก็คิดว่า เอ้า ไหน ๆ ก็ไหน ๆ ลองทำเวบที่เกี่ยวกับกับไบโออินฟอเมติกส์ดูก็ละกัน จะได้เผยแพร่คนอื่นและน่าจะเป็นประโยชน์กับตัวเองด้วย คือว่าต้องไปขวนขวายหาเรื่องที่เกี่ยวข้องมาเขียน อ่านเรื่อง google adsense เค้าก็แนะนำให้ทำบล็อกกับ blogger เราก็มาทำบล็อก ติดโฆษณาเรียบร้อย แต่ไม่เห็นมีคนมาคลิกเลย 55 ช่างมัน ไม่เป็นไร

มาเริ่มโปรเจคใหม่กันดีกว่า คิดไปคิดมาก็อยากมีเว็บไซต์เป็นของตัวเอง เห็นเค้าว่ามันเวิร์คกว่าเขียนบล็อก และก็ยากกว่าด้วย นี่เป็นสิ่งที่ท้าท้าย เราเคยทำเวบมาบ้างนิดหน่อย ย้ำ นิดเดียว ช่วงก่อนไปเปิดโฮสต์ฟรีของ 000webhost.com ได้เนื้อที่ 1500MB เยอะใช้ได้ และศึกษา joomla ลองอัพเว็บดู รู้สึกว่ามันยาก อ่อ วันนี้ไม่ได้มีเทคนิคอะไรมานำเสนอ แต่อยากบอกว่าอยากรู้อะไรต้องศึกษาและต้องพยายามเอง ก่อนเดินทางลัด เราขอลองเดินทางตรงดูซักตั้ง จากเวบ hosting ของ 000webhost เราก็ได้โดเมนเนมเป็น http://keepgoing.comoj.com .........อื่มนะ... ไม่รู้คิดยังไงอยากมี .com เป็นของตัวเอง ก็เลยไปเซิชที่รับจดทะเบียนโดเมนเนม มีตั้งหลายเวบ เราเซิชของนอกนะ ไม่รู้ดิ คิดว่าน่าจะถูกกว่าของไทย มีเวบ godaddy.com cheapname.com แล้วก็มีเวบที่เราคิดว่าโอเคสำหรับเราคือเวบ hostingdude.com แหม๊ ชื่อมันเท่จริง ๆ เราเปิดเวบโฮสติ้งดู๊ดเทียบกับอีกหลาย ๆ เวบ เราว่าหน้าตามันเหมือนกันเลยอะ เช่นเวบ luckyregister.com ดูราคาโปรโมชันอะไรก็เหมือนกันไปหมดเลย ราคามีตั้งแต่หลายเหรียญ ต่ำสุดที่เราเจอและที่เราสมัครคือ 7.49$ ที่เวบ hostingdude.com แต่พอรวมราคาไม่รู้มันบวกอะไรเพิ่มกลายเป็น 7.6$กว่า ๆ และเวบนี้ดีที่ใช้ paypal จ่ายได้ด้วย เวบส่วนใหญ่ก็ใช้ paypal จ่ายได้เหมือนกัน

ชื่อโดเมนเนมดี ๆ เจ๋ง ๆ ก็มีคนเอาไปหมดแล้ว เราจะทำเรื่องไบโออินฟอเมติกส์ ชื่อ bioinformatics.com bioinfo.com bioin4matics.com อะไรงี้ก็มีคนเอาแล้ว เราเลยเอา bioin4.com ก็ได้ ตอนนี้เอามันมาเชื่อมกัน keepgoing.comoj.com ที่เคยมี มั่ว ๆ ไป เอาล่ะ จะเป็นอย่างไรต่อไป ต้องติดตาม

Tuesday, July 28, 2009

หุ้น คือ อะไร

หุ้น คืออะไร
หุ้นในที่นี้ก็เหมือนหุ้นส่วนเวลาเราจะหุ้นกับเพื่อน ๆ หรือคนรู้จักไปเปิดร้านอะไรซักอย่างนึง เช่นเราบอกว่าเราซื้อหุ้น ปตท. หรือหุ้นดีแทค ก็เหมือนกับเราเอาเงินมาลงขันกับเค้า ให้เค้าเอาเงินไปทำธุรกิจให้มันเติบโต พอได้กำไร เค้าก็แบ่งเงินส่วนหนึ่งมาเป็นเงินปันผลให้เรา เราลงแค่เงินแต่เค้าก็มีทีมผู้บริหารและพนักงานรออยู่แล้ว เค้าก็เอาบริษัทเค้าเข้าตลาดหุ้น(Stock Market)เพื่อรวบรวมเงินจากคนอื่นมาใช้ทำธุรกิจให้มันงอกงามนั่นเอง ตรงกันข้าม ถ้าบริษัทเกิดขาดทุนหรือเจ๊งกะบ๊งขึ้นมา เราก็ต้องมีส่วนในการรับผิดชอบด้วย เช่น เราอาจจะเสียเงินที่เราลงไปด้วย

การซื้อขายหุ้นไปทำกันที่ไหน
เอาในไทยก่อนนะ ก็ซื้อที่ตลาดน่ะสิ แต่เดินถือเงินไปซื้อเลยไม่ได้นะ ต้องซื้อผ่านโบรคเกอร์ซึ่ง(อะไรยังไงค่อยว่ากันวันหลัง) ตลาดหุ้นก็เป็นแหล่งที่บริษัทต่าง ๆ มารวมกัน แล้วก็เปิดให้คนทั่วไปมาร่วมลงทุนด้วย เช่นอาจเคยได้ยิน หุ้นของบริษัท ปตท. หุ้นธนาคารกสิกร หุ้นบริษัทซีพี หุ้นบริษัทนั่นนี่นู่นมากมาย หุ้นจะขายเป็นหน่วย เช่น หน่วยละ 5 บาท เราซื้อ 100 หน่วยก็เป็นเงิน 500 บาท ราคาหุ้นต่ำสุด ๆ อยู่ที่หน่วยละ 0.01 บาท ซึ่งก็คือ 1 สตางค์นั่นเอง

ตลาดหุ้น หรือ ตลาดหลักทรัพย์ ในไทยมี 2 ตลาด คือ
1. ตลาดหลักทรัพย์แห่งประเทศไทย (The Stock Exchange of Thailand) เวบไซต์ http://www.set.or.th/th/index.html
ตลาดนี้เริ่มเปิดให้ซื้อขายครั้งแรกเมื่อ 30 เมษา 2518 โอ้ว นับถึงวันนี้ก็ 34 ปีกว่า ๆ แล้วสินะ ของที่ขายในตลาดนี้ หมายถึงบริษัทที่อยู่ในตลาดนี้จะเป็นบริษัทใหญ่ ๆ ทุนจดทะเบียนเยอะ ๆ เกิน 100 ล้าน ตอนนี้มีบริษัทในตลาดทั้งหมดเกือบ 500 บริษัท(ล่าสุดที่ดูคือ 497 และก็คงเพิ่มขึ้นอีกในอนาคต)
2. mai(Market for Alternative Investment)
อันนี้เปิดซื้อขายครั้งแรกเมื่อ 17 กันยา 2544 เป็นตลาดหุ้นสำหรับธุรกิจที่ขนาดเล็กลงมา ทุนจดทะเบียนไม่เกิน 100 ล้าน ตอนนี้มีบริษัทจดทะเบียนในตลาดนี้ 54 บริษัท

เอาแค่นี้ก่อน ต่อไปมีเกมหุ้นมาแนะนำ

อันแรก ของห้องสินธรในเวบพันทิป ใครเป็นสมาชิกของเวบพันทิปอยู่แล้วก็ใช้แอคเคาท์นั้นลองเล่นได้เลย มีคู่มือให้อ่านดูด้วย เริ่มเล่นจะมีเงินปลอม ๆ ให้เล่น 1,000,000 บาท ก็เอาไปลองซื้อ ๆ ขาย ๆ กันดู ว่าจะทำให้เงินล้านบาทงอกเงยได้แค่ไหน ราคาหุ้นก็จะอ้างอิงจากตลาดหลักทรัพย์แห่งประเทศไทย
เวบไซต์ http://stockgame.pantip.com/

อันที่สอง เกมลงทุนหุ้นออนไลน์ของเวบคลิกทูวิน มีเงินเริ่มต้นให้ 1,000,000 บาทเหมือนกัน ราคาหุ้นขึ้นลงเหมือนจริงในตลาดหลักทรัพย์ เวลาเปิดปิดตลาดก็เหมือนจริงมั๊กมักขอบอก นี่เค้าเปิดให้แข่งกันเล่นหุ้นไปรอบที่สามแล้ว สมัครตอนนี้ยังทัน ไปซ้อมมือกันได้ เล่นได้ดีมีรางวัลด้วยนะ จะบอกให้

อันสุดท้าย เป็นหุ้น Forex ของต่างประเทศ จะว่าเป็นเกมก็ไม่เชิง หุ้นนี้ซื้อขายค่าเงินกัน คือ อัตตราแลกเปลี่ยนเงินน่ะ เราก็ไม่ค่อยเข้าใจ เค้าให้เงินจริงมา 5$ แล้วก็เงินปลอมอีก 10,000$ แต่อันนี้ค่อนข้างยุ่งยาก สามารถไปเซิชดูในพี่กูเกิ้ลได้ มีคนไทยเขียนบทความเกี่ยวกับหุ้นฟอเร็กซ์เยอะมาก มี Dow Jones ให้ซื้อด้วยนะ ใครอยากสัมผัสดาวน์โจนส์ ก็ไปลองเล่นกันดู

ขอให้สนุกสนานกับการออมเงินนะ ระวังเล่นหุ้นเยอะ ๆ จะนอนหลับไม่สนิทนะ แบบว่าผวา กลัวหุ้นตก อิอิ

Welcome to Chiang Mai University

วันนี้มาอ่านเจอ เขียนได้อย่างเท่ ภาษาสละสลวยมากกกกกกกก
ความจริงวันนี้คือ มช. รถเยอะมาก และไม่มีที่จอดรถเพียงพอ ให้ตายเหอะ โรบิ้น(ทำไมต้องพูดคำนี้ออกมาด้วยฟะ) และ มช.สร้างถนนบ่อยมาก เพราะสร้างแล้วก็พังแล้วก็สร้างแล้วก็พัง - -" และตรงลานจอดรถคณะสังคมก็สร้างอาคารเฉลิมพระเกียรติ(ขอโทษนะ ทำอะไรก็เฉลิมพระเกียรติไปหมด รู้สึกเอียน) พอมาสร้างอาคาร ที่จอดรถก็หายไปอีกที่ ทำให้ตอนนี้หน้าสหกรณ์ตอนกลางวันรถจอดเต๊ม! เราคิดว่าสร้างอาคารจอดรถซะยังจะดีกว่า อันนี้สร้างตึกไม่สร้างที่จอดรถ ไม่มองอนาคตบ้างเลย พอละดีกว่า เดี๋ยวจะยาว เอาที่คุณท่าน ออธอกอบอดอ เขียนมาอ่านกันซักหน่อย
จากหน้าเวบมช.ที่เป็นภาษาอังกฤษหน้าแรกเลยพี่น้อง http://www.cmu.ac.th/cmueng2008/index.php

----------------------------------------------------

Welcome to Chiang Mai University

Outside of Bangkok, CMU, known locally as Mor Chor, is Thailand's oldest, largest, and most renowned institute of higher education. As President, it gives me great pleasure to introduce you to our university. Step onto our campus and you will find yourself entering a world where education is a living force, an irresistible energy that stimulates the creative and innovative spirit that resides within all of those who take up the challenge of the life-long pursuit of knowledge. Today, we live in a world driven by a hitherto unprecedented acceleration in the rate of technological advancement and its concomitant demands for increasingly higher standards of academic excellence. Chiang Mai University is acknowledged as the pre-eminent center for study in the North of Thailand. To maintain this position of excellence, we continually strive to improve, to ensure that our range of disciplines, our teaching methodology and our research activities reflect world standards. Our most recent advance in curriculum development has been to adopt an interdisciplinary educational system, designed to offer students the utmost flexibility in creating study programs.
In pursuit of our goal, we welcome both international students and international cooperation in collaborative research ventures. Internationalisation is the cornerstone of our continuing process of expansion, development and improvement of academic standards. Through constructive interaction with our international partners; in teaching, research and through contact with professional associations, we seek innovative solutions to local, national and global problems. We consider these to be mutually beneficial partnerships – and we are delighted that so many of our visitors report having gained unique insights from their first-hand experience of our Thai cultural approach.
Our current, ongoing drive to enhance both the quantity and quality of the research projects conducted at CMU means that we are continually seeking new international partners for collaborative research. Such mutually beneficial collaboration will combine our national strengths with those of partner universities and institutes world-wide.
We warmly welcome international students to Chiang Mai and greatly value the broad perspectives and varieties of experience that they bring with them to share with our Thai students. The continuing increase in the number of our international students reassures us that we are successfully working towards our goal of providing world-class standards in education. Visiting students, studying, in English, in one of the many international short courses and postgraduate programmes that we offer, discover in Chiang Mai a city where modern cosmopolitan life is an integral part of a tapestry woven on a cloth of timeless tranquility.
The ambience and learning facilities at Chiang Mai University are unmatched in Northern Thailand. I am confident that, no matter what your academic interests, if you are determined to make the best of your abilities you will feel at home amongst the staff and students at CMU: people who share a common goal to make a full contribution in tomorrow's exciting, and challenging world.
Welcome to CMU.

Pongsak Angkasith, Ed.D.
President

Monday, July 27, 2009

ทำโยเกิร์ต (Making Yoghurt)

(ยืมรูปจากพี่กูเกิ้ล เหมือนเดิมนะ)

ก่อนอื่นเลย โยเกิร์ตไม่ใช่นมบูด การที่่จะให้นิยามของคำว่า อาหารที่บูดคือ อาหารเสียที่เราไม่ต้องการ อย่างงี้จะหมายความว่าอย่างไร

เราเอานมมาทำโยเกิร์ตแล้วได้โยเกิร์ตตามที่เราอยากจะได้ คือบรรลุวัตถุประสงค์การทำโยเกิร์ต

ทำโยเกิร์ต ได้โยเกิร์ต = ไม่ใช่นมบูด

ทำโยเกิร์ต ได้นมอะไรก็ไม่รู้ = นมบูด

ประวัติโยเกิร์ต

จากวิกิพีเดีย นักประวัติศาสตร์มีความเห็นว่า โยเกิร์ตเป็นอาหารที่รวมอยู่ในโภชนาการของชนเผ่าทราเซียน อันเป็นบรรพบุรุษเก่าแก่ที่สุดของชาวบัลแกเรีย ชาวทราเซียนเก่งในการเลี้ยงแกะ คำว่า yog ในภาษาทราเซียน แปลว่า หนาหรือข้น ส่วน urt แปลว่า น้ำนม คำ yoghurt น่าจะได้มาจากการสมาสของคำทั้งสองข้างต้น ในยุคโบราณราวศตวรรษที่ 4 ถึง 6 ก่อนคริสตกาล ชาวทราเซียนมีวิธีการเก็บรักษาน้ำนมไว้ในถุง ที่ทำจากหนังแกะ เวลาไปไหนต่อไหนก็เอาถุงนี้คาดเอวไว้ ความอบอุ่นจากร่างกายร่วมกับจุลชีพที่มีอยู่ในหนังแกะ ช่วยให้เกิดปฏิกิริยาการหมักขึ้น น้ำนมในถุงก็กลายสภาพเป็นโยเกิร์ตไป

นักวิทยาศาสตร์บางคนสันนิษฐานว่า สิ่งที่มีมาก่อนโยเกิร์ตน่าจะเป็นน้ำนมหมักที่ใช้ดื่ม เรียกว่า คูมิส (Kumis) น้ำนมชนิดนี้ทำมาจากน้ำนมม้า โดยชนเผ่าที่มาอยู่ก่อนหน้าชาวบัลแกเรีย เช่น ชนเผ่าที่เร่ร่อนที่อพยพย้ายถิ่นฐานจากทวีปเอเชียมายังคาบสมุทรมัลข่าน ในปี ค.ศ.681

ในยุโรปตะวันตก โยเกิร์ตปรากฏขึ้นเป็นครั้งแรกในศตวรรษที่ 16 ในราชสำนักของกษัตริย์ฟรานซิสที่ 1 แห่งฝรั่งเศส ครั้งนั้นกษัตริย์พระองค์นี้ประชวร มีพระอาการปั่นป่วนในท้อง แพทย์ชาวตุรกีผู้หนึ่งจึงทำการรักษาโดยให้เสวยโยเกิร์ตที่นำมาจากบัลแกเรีย เรื่องนี้ศาสตราจารย์คริสโต โชมาคอฟ รายงานไว้ในหนังสือ Bulgarian Yoghurt-Health and Longerity

ส่วนผสม

1. น้ำนม อะไรก็ได้ ยกเว้นนมเม็ด นมผง นมข้นหวาน

2. เชื้อจุลินทรีย์ Lactobacillus bulgaricus , Streptococcus themophilus โอ้ว ยุ่งยาก ไปซื้อโยเกิร์ตมาดีกว่า เอายี่ห้อไหนก็ได้ เอามาเลย เพราะว่าในโยเกิร์ตก็ยังมีจุลินทรีย์ที่ยังมีชีวิตอยู่ให้เราใช้ได้ 55

3. น้ำตาล(อันนี้แล้วแต่ชอบจะใส่หรือไม่ใส่ก็ได้ ถ้าไม่อยากเติมน้ำตาลก็ซื้อนมหวานมาเลยดีกว่า)

4. ผลไม้หรือแยมต่าง ๆ หรือน้ำผึ้ง อะไรก็ได้ที่คุณจะสรรหามากินกับโยเกิร์ตอะนะ แนะนำ Bestfood รุ่น Squeezy เวิร์คมั่ก

วิธีทำแบบยากและดูดีมีหลักการ

1. อุ่นนมให้ร้อนประมาณ 60-70 องศาเซลเซียส เคยอ่านมามันจะช่วยฆ่าเชื้อได้บางส่วนและช่วยให้โปรตีนในน้ำนมเข้ากับน้ำได้ ดี จะช่วยให้โยเกิร์ตที่เราทำไม่แยกตัวเป็นชั้นเวลาทำ ใช้เวลาประมาณแป๊บเดียว สองสามนาที

2. ใส่เชื้อจุลินทรีย์ลงไป ใส่มากได้โยเกิร์ตเร็วกว่า ใส่น้อยได้โยเกิร์ตช้ากว่า อ่อ ใส่ตอนที่อุณหภูมินมประมาณ 40 องศาเซลเซียส กะเอาก็ได้ เอาว่าเราจับนมแล้วมันไม่ร้อนเกินไป จุลินทรีย์มันก็น่าจะอยู่ได้ ที่อ่าน ๆ มาเค้าบอกว่า ให้ใส่โยเกิร์ต 1 ช้อนโต๊ะพูน ๆ ต่อนม 500 ml ส่วนเราก็ใส่มันทั้งถ้วยที่ซื้อมานะแหละ ไม่ต้องคิดมาก 555 แล้วก็คน ๆ ให้นมกับโยเกิร์ตมันเข้ากัน

3. บ่มให้เชื้อจุลินทร์เจริญ ถ้าใครมีตู้ควบคุมอุณหภูมิ แบบว่าทำงานในห้องแลบไรงี้ก็ใส่ไปเลย 37 องศาเซลเซียส ซัก 4 ชั่วโมงก็ได้ละ เอาไปใส่ตู้เย็นโลด กินแบบเย็น ๆ แต่คาดว่าชาวบ้านทั่วไปคงไม่มีตู้(incubator) ก็ให้เอานมที่ได้จากข้อ 2 เมื่อกี๊ใส่ภาชนะ ปิดฝา ตั้งทิ้งไว้ที่ที่คิดว่าอุ่นที่สุดของบ้าน หรือถ้าไม่มีก็ตั้งทิ้งไว้ที่โต๊ะซักโต๊ะ ใช้เวลาประมาณคืนนึง 8-12 ชั่วโมง ก็กะ ๆ เอา ทิ้งไว้นานก็เปรี๊ยวมาก ถ้าเช็คดูแล้วโอเค ก็เอาโยเกิร์ตใส่ตู้เย็นเลย เก็บไว้ได้อีกหลายวัน เวลาจะกินก็เอามากินกับผลไม้ เช่น มะม่วงนี่สุดยอดไปเลยยยยย

วิธีทำแบบง่ายและได้ผล

1. ซื้อนมรสหวานมา 1 ขวด จะเอาขนาด 500 ml หรือ 1000 ml ก็ได้

2. ซื้อโยเกิร์ตรสธรรมชาติมา 1 ถ้วย

3. ดูดนมจากข้อ 1 ไปซักหน่อย แล้วเอาโยเกิร์ตจากข้อ 2 ใส่ไปซักนิด ใส่หมดก็ได้ มันก็จะกลายเป็นโยเกิร์ตเร็วขึ้นกว่าเดิม เพราะว่ามีปริมาณเชื้อตั้งต้นมาก

4. ปิดฝาขวดนมแล้วเขย่า

5. เอาไปวางไว้หลังพัดลม CPU มันจะอุ่น ๆ ดี หรือเอาไว้ในกระติกน้ำแข็งแล้วเอาน้ำร้อนใส่ขวดไว้แล้วเอาขวดที่มีโยเกิร์ตใส้ไว้ด้วย เป้าหมายคือให้ภาชนะที่ใส่โยเกิร์ตอุ่น แบคทีเรียมันจะเจริญได้ ใช้เวลาซัก 6-8 ชั่วโมงก็จะได้ก็ลองเปิดขวดเช็คดูว่าข้นหรือยัง หรือจะเอาไปบ่มไว้ในรถที่จอดตากแดดไว้ก็ได้ เวิร์คมากเหมือนกัน 555 ระวังร้อนเกิน โยเกิร์ตจะเป็นแบบเหนียว ๆ ไม่น่ากิน มันจะยืด ๆ น่ะ

6. ได้ที่แล้ว พอใจแล้ว กล้าทำก็ต้องกล้ากิน อย่าลืมกินให้คนอื่นดูด้วยนะ 555

----------------------------------------------

สำหรับคนที่เครซี่และจริงจังกับการกินโยเกิร์ต วันนี้เรามีของมานำเสนอ(ข้อมูลและภาพจาก google เช่นเดิม) แถ่น แทน แท๊น...ช้อนโยเกิร์ต ออกแบบโดยคุณ Nojae Park

วิชา Machine Learning

พูดถึง Machine Learning ก็คือ เราสร้าง machine แล้วให้มัน learning อะไรซักอย่าง แล้วต่อไปก็ให้มันคิดเอง ตัวอย่าง เช่น สมมุติว่าสร้างโปรแกรมวิเคราะห์หุ้นว่ามันจะขึ้นจะลงอย่างไร เราก็สร้างโมเดล (โมเดลก็คือสมการ ที่มีค่าพารามิเตอร์ เช่นสมการ y=ax+b มีค่าพารามิเตอร์คือ a กับ b) แล้วก็ให้โมเดลมันเรียนรู้ โดยการสอนมัน เช่น ถ้ากราฟมาแบบนี้และเกิดสถานการณ์แบบนี้ หุ้นจะขึ้น หรือ ถ้าเกิดสถานการณ์อีกแบบ หุ้นจะลง ให้มันเรียนรู้หลาย ๆ สถานการณ์ เรียน ๆๆ เสร็จ ก็จะได้โมเดล ที่เราจะนำเอาไปใช้งาน
สมมุติว่าได้โมเดลวิเคราะห์หุ้นแล้ว เราเอาไปใช้กับสถานการณ์จริง ซึ่งไม่เคยให้มันเรียนมาก่อน แต่มันคิดเองได้ ว่าหุ้นจะขึ้นหรือลง แล้วมันก็ทายผลออกมา นี่แหละ machine learning
การให้มันเรียนรู้ ก็มีวิธีหลายอย่าง เช่น Hidden Markov Models, Conditional Random Fields, Support Vector Machine, Artificial Neuron Network และอื่น ๆ อีกมากมายก่ายกอง
สามารถหาความรู้ได้ในยูทูบ ยกตัวอย่างเช่น อันนี้เป็นของมหาวิทยาลัย Standford ที่อเมริกา เป็นมหาลัยเอกชนชื่อดังแห่งหนึ่ง เราคิดว่ามันดังนะ 55 อันนี้ยกตัวอย่างมาเป็นแค่ตอนเค้าอินโทร จะเริ่มเรียน ที่จริงมันมีจนจบคอร์ส machine learning เลยนะ ไว้จะมาพูดให้ฟังอีกเรื่อย ๆ

แนะนำหลักสูตร Bioinformatics มหาวิทยาลัยเชียงใหม่

มาแนะนำให้เรียน สำหรับคนที่ชอบอะไรใหม่ ๆ ชอบก้าวนำไปพร้อม ๆ กับโลก โอ่โห เว่อร์ไป วันนี้จะมาแนะนำหลักสูตร Bioinformatics หรือภาษาไทยว่า ชีวสารสนเทศศาสตร์ ของมช. มช.คือมหาวิทยาลัยเชียงใหม่นะ อยู่ จ.เชียงใหม่ - -" เปิดหลักสูตรเฉพาะป.โท เรียกแบบเท่ ๆ ก็หลักสูตรมหาบัณฑิต เวบไซต์ http://bioinfo.science.cmu.ac.th/ ไปเชี่ยมยม เอ๊ย เยี่ยมชมกันได้ เวบไม่ค่อยได้อัพเดท ใครเข้ามาเรียนก็มาอัพเดทด้วยละกัน หึหึ

ดูหลักสูตรกันคร่าว ๆ ก่อน ถ้ามาเรียนจริง ๆ อาจจะต้องไปนั่งเรียนตัวอื่นเพิ่มด้วย เรียนกันให้ตายไปข้างนึงเลยทีเดียวเชียว

(ยืมภาพปลากรอบจากกูเกิ้ลนะ)

----------------------------------------------------

หลักสูตรวิทยาศาสตรมหาบัณฑิต สาขาวิชาชีวสารสนเทศศาสตร์ (หลักสูตร ใหม่ พ.ศ. 2549)
ชื่อ
: หลักสูตรวิทยาศาสตรมหาบัณฑิต สาขาวิชาชีวสารสนเทศศาสตร์
ชื่อปริญญา
: วิทยาศาสตรมหาบัณฑิต (ชีวสารสนเทศศาสตร์)

: วท.ม. (ชีวสารสนเทศศาสตร์)
แผน ก แบบ ก.1
แผน ก แบบ ก.2
โครงสร้างหลักสูตร
โครงสร้างหลักสูตร
แผนการศึกษาหลักสูตร
แผนการศึกษาหลักสูตร
คำอธิบายลักษณะกระบวนวิชา
กระบวนวิชาที่เปิดใหม่ สาขา วิชาชีวสารสนเทศศาสตร์ จำนวน 11 กระบวนวิชา
1. ว.ชส. (223)711 การประมวลชีวสารสนเทศศาสตร์ Bioinformatics Computing
2. ว.ชส. (223) 721 สถิติสำหรับชีวสารสนเทศ ศาสตร์ Statistical Method for Bioinformatics
3. ว.ชส. (223) 722 แบบจำลองทางสถิติสำหรับชีวสารส นเทศศาสตร์ Statistical Models for Bioinformatics
4. ว.ชส. (223) 731 ชีววิทยาระดับโมเลกุลขั้นพื้นฐาน Fundamental Molecular Biology
5. ว.ชส. (223)741 การวิเคราะห์สายลำดับดีเอ็นเอและโปรตีน DNA and Protein Sequence Analysis
6. ว. ชส. (223)788 หัวข้อเลือกสรรทางชีวสารสนเทศศาสตร์ 1 Selected Topics in Bioinformatics I
7. ว.ชส. (223)789 หัวข้อเลือกสรรทางชีวสารสนเทศศาสตร์ 2 Selected Topics in Bioinformatics II
8. ว.ชส. (223)791 สัมมนาทางชีวสารสนเทศศาสตร์ 1 Seminar in Bioinformatics I
9. ว.ชส. (223)792 สัมมนาทางชีวสารสนเทศศาสตร์ 2 Seminar in Bioinformatics II
10. ว.ชส. (223)797 วิทยานิพนธ์ปริญญาโท M.S. Thesis
11. ว.ชส. (223)799 วิทยานิพนธ์ปริญญาโท M.S. Thesis

-------------------------------------------------

ต่อไปจะมาพูดเกี่ยวกับวิชาเรียนต่าง ๆ นะ

ออกกำลังกายอีกครั้ง

เมื่อวานวันอาทิตย์ 4 โมงเย็นกว่า นอนอ่านหนังสือ แล้วหลับไป อยู่ ๆ เหมือนจะรู้สึกว่าหยุดหายใจ
รู้สึกว่าตัวเองเริ่มอ้วน เห็นทีต้องออกกำลังกายซะแล้ว ตัดสินใจปั่นจักรยานไปหา นอนอ
ชวน นอนอ ปั่นจกย. ไปเล่นแบดกันที่ศาลาธรรม ที่ศาลาธรรม มช. หมายความว่า แถว ๆ ศาลาธรรม ไม่ใช่ในศาลาธรรมเลยนะ มักจะมีคนมาเล่นแบดมินตัน พาลูกมาปั่นจักรยาน หรือบางคนก็มาวิ่ง เอาหมามาวิ่งด้วยก็มี และบรรยากาศ ก็ดี แต่ว่าเป็นแบบ outdoor เล่นแบดก็จะมีลมพัด เป็นความท้าทายในการเล่น 55 คนที่อยู่เหนือลมก็ตีจึ๊กเดียวก็ไปไกลแล้ว อะไรงี้
ยังเจ็บข้อเท้าอยู่เพราะว่า เมื่อสองอาทิตย์ที่แล้วก็เล่นแบดอย่างงี้แล้วก็ ต้าว(ต้าว=ล้ม) ข้อเท้าแพลง
วันนี้ก็เลยไม่วิ่ง หวดอย่างเดียว

ได้ออกกำลังกายก็รู้สึกดีขึ้นมาหน่อย จากที่ทั้งวันนั่งเล่นแต่คอม ปวดหลังไปหมดเลย ไม่ได้ขยับตัว
ออกไปข้างนอกก็ได้เคลื่อนไหวร่างกาย หายปวดเมื่อย

สุขภาพดีไม่มีขาย นะแจ๊ะ ๆ

Saturday, July 25, 2009

แนะนำ Bioinfomatics

Bioinfomatics คืออะไร มันมีภาษาไทยว่า ชีวสารสนเทศศาสตร์
เอาง่าย ๆ มันก็คือวิชา วิชาหนึ่งที่ว่าด้วยการใช้คอมพิวเตอร์มาจัดการข้อมูลทางชีววิทยาที่มีอยู่อย่างมากมาย ซึ่งข้อมูลเหล่านั้นก็คือ ลำดับของดีเอ็นเอ

เอาคำอธิบายในวิกิมาฝาก()
ชีวสารสนเทศศาสตร์ (อังกฤษ: Bioinformatics) หรือ ชีววิทยาเชิงคำนวณ (Computational Biology) เป็นสาขาที่ใช้ความรู้จากคณิตศาสตร์ประยุกต์, สถิติศาสตร์, สารสนเทศศาสตร์, และวิทยาการคอมพิวเตอร์ เพื่อแก้ปัญหาทางชีววิทยา.

สาขาวิชานี้ที่เปิดสอนในไทยเหมือนจะมีอยู่ 2 ที่ คือที่ม.เกษตร กับ มช.
ม.เกษตร จะมีสอนในหลักสูตร ป.โท และ ป.เอก เรียนหนักมากนะขอบอก บางทีเรียนกันทั้งวัน ถึงสามสี่ทุ่ม
ส่วนมช. จะมีถึงแค่ ป.โท เรียนหนักเหมือนกัน เน้นเรียนไปทางโมเดลลิ่ง ลงลึกถึงสมการกันเลยทีเดียวเชียว
คนที่จะมาเรียนก็สามารถมีพื้นฐานมาได้จากหลาย ๆ สาขา เช่น คอมพิวเตอร์ ชีววิทยา คณิตศาสตร์ สถิติ เป็นต้น แล้วค่อยมาเรียนรู้เพิ่มเติมเอา แต่ก็ต้องศึกษาเองมาก ๆ เลย เพราะว่าในประเทศไทย ยังขาดแคลนคนที่เชี่ยวชาญเรื่องนี้จริง ๆ
คนที่เป็นนักคอมพิวเตอร์ ก็ไม่อยากเรียนชีวะ คนที่เป็นนักชีวะก็ไม่อยากเรียนคอมพิวเตอร์ เรียนหมายถึง เรียนรู้อย่างลึกซึ้งนะ

สิ่งที่จะต้องเรียน เอาเท่าที่นึกออก
Molecular Biology
Sequence Analysis
Perl Programing
Database
R Programing
โปรแกรมต่าง ๆ SPSS, Matlab, Maple
Statistical Modeling
ฯลฯ
ดูเหมือนน้อยจังเลยง่ะ นึกไม่ออกละ - -"

ใครมีสมองและมีตังค์ก็มาเรียนนะ ที่มช.ค่าเทอมเหมาจ่ายเทอมละ 22,500 บาท(ตลอดหลักสูตรก็ 90,000) ลงเรียนได้ไม่อั้น นะจ๊ะ
ถ้ามีตังค์เยอะอีกหน่อยไปเรียนที่ต่างประเทศเลย น่าจะดีกว่า

ไดอารีออนไลน์

อยากเขียนบล็อกมั่งง่ะ
เมื่อก่อนเขียนไดอารีฮับ(www.diaryhub.com) แต่!! แต่ แต่!! ตอนนี้เวบเค้าเจ๊งไปแล้วอะ ไม่มีแล้ว แต่มีเวบ diaryis.com มาแทน รึป่าวไม่แน่ใจ แต่ยังไง diaryis ก็ไม่เหมือน diaryhub
พยายามเป็นกำลังใจให้ไดฮับอยู่หลายช่วง บางช่วงกลับมา บางช่วงเหมือนจะดีขึ้น แต่มันก็ไม่ดี ข้อมูลต่าง ๆ ที่เคยเขียนไว้ ก็หายไปหมดเลย รู้สึกเสียใจเหมือนกัน ต่อมาก็ไม่เคยเขียนอะไรไว้ในเน็ตอีกเลย ช่วงนู้น ไดฮับมีฟังก์ชั่น [private][/private] ด้วยนะ รู้สึกชอบมาก เพราะว่าคนอื่นจะไม่เห็นข้อความที่อยู่ในแท็กไพรเวท
ช่วงนั้นที่เขียนไดประมาณ ม.6 ต่อด้วย ปี 1 ได้ไปเรียนที่ มช. รู้สึกดีที่ได้รู้ว่า อ๊าว เพื่อนใหม่เรา เค้าก็เขียนไดเหมือนเราด้วย ช่างเท่จริง ๆ 555
วันนี้วันเสาร์ ต้องกินเกาเหลานะ

Back to Back Champion in Bioinformatics

Amazon Deals

Monday, August 17, 2009

Wednesday, August 5, 2009

Thursday, July 30, 2009

Biological Background คร่าว ๆ เกี่ยวกับการทำงานของ DNA

Wednesday, July 29, 2009

จดโดเมนเนม (Register Domain Name)

Tuesday, July 28, 2009

หุ้น คือ อะไร

Welcome to Chiang Mai University

Monday, July 27, 2009

ทำโยเกิร์ต (Making Yoghurt)

วิชา Machine Learning

แนะนำหลักสูตร Bioinformatics มหาวิทยาลัยเชียงใหม่

ออกกำลังกายอีกครั้ง

Saturday, July 25, 2009

แนะนำ Bioinfomatics

ไดอารีออนไลน์

Home

Search This Blog

Blog Archive

Followers