Linguistic Analysis
- Speech
- Written Language
This article focuses on Written Language rather than Speech because speech could be transfigured into written form.
Components in analysing Written Language
- Pronology : Analysing Sound / Pronunciation
- Morphology : Analysing Structure of Words.
- Example : books, booked, booking -> book, run, ran, running -> run
- Syntax : Grammar
- Semantics : Meaning of Strings and Interaction among them.
Now coming to issues,
Issues in Syntax
"The dog ate my homework." - Who did what
1. Identify POS( Parts of Speech ) Tagging
Dog : Noun, Ate : Verb, Homework : Noun
Note: So far English POS tagging is up to 95% accurate but it also can be improvised.
2. Identify collocation
mother in law, hot dog are single word.
3. Shallow Parsing
4. Anaphora Resolution
The dog entered my home. It scared me.
Here It refers to The dog in the first sentence. It must be resolved.
5. Preposition adjustment
I saw a man in the park with a telescope.
Whether it means ( man with telescope or I saw through telescope ).
Here the ambiguity must be resolved.
Issues in Semantics
Consider Plant,
Plant: Living Organism
Plant: Industry
The plant producing 1000 automobiles : Here the plant is Industry.
The plant produce apple fruit : Living Organism
The plant is close to the farm of animals : Ambiguity.
Learn from annotated examples (Statistical Learning Algorithm)
- 1000s of examples tagged by human will be used to train algorithm.
- Training Algorithm is required.
- Precision : 60 - 70 % but can be improved up to 80%.
How to choose Learning Algorithm?
How to tag 1000s of examples?
Learning Process
- Large voluminous annotated data is required for training and an unannotated fresh text is required for testing.
- The algorithm learn from previous experience(training) and classify new data(testing).
- Various algorithm exist some of them are Decision tree, Memory based Learning, Neural-net, Machine Learning etc.,
Issues in Information Extraction
There was a group of 8-9 people close to the Highway 82.
- Who? - 8 to 9 People.
- Where? - Highway 82.
- Proximity? Close.
Issues in Information retrieval
- A huge collection of text.
- A query
I went to bank to observe sunset.
Bank has different forms in grammar
- River Bank (noun)
- Cash - cache of money - Bank (noun)
- depend (verb) - with the help syntactic analysis the verb "depend" will be eliminated.
Question Answering
- What is the height of mount Everest? - 11,000 feet.
- It have 40-50% accuracy.
- We can use common sense knowledge to increase accuracy.
- By performing domain specific question answering.
- Finding information across languages.
- Example: what is minimum age requirement for car rental in Italy.
Issues in machine Translation
- Text to Text
- Speech to Speech
- English - French (text)
- English - Chinese (Speech)
Comments
Post a Comment