Arabic Coding Scheme
SSA has developed a number of tools for automated coding of Modern Standard Arabic
(MSA). In addition to our coding capabilities, we also have a robust web reporting
service, a large corpus of Arabic media documents and the capability to continuously
download and analyze new media texts. The following is a brief description of each
capability.
Arabic Tagger Parser
Arabic script does not separate words as unique text strings as in English. In order
to analyze text and identify individual terms, SSA has developed a tool which first
parses MSA text into individual words, and then tags each term with relevant grammatical
information. This tool will:
- Remove attached pronouns, preceding determiners, and preceding prepositions
- Identify parts of speech for all words
- Standardize spellings of verbs and sound plural forms of nouns and adjectives
- Identify tense, gender, number, and verb form of words
- Identify participles and verbal nouns
Entity Extraction and Identification
The entity extraction and identification tools identify references to specific entities
and the people that make up or are affiliated with that entity and label them as
part of that entity. These tools identify reference variants, standardize their
spellings, and aggregate them at country and/or group levels. In addition to general
purpose entity extraction routines, specific routines identify references to all
official country names, national adjectives, and capital cities as per US government
list of FIPS Codes, and more comprehensive entity identification routines have been
developed for:
- America
- Israel
- Palestine
- Fatah
- Hamas
|
- Palestinian Islamic Jihad
- Hizbullah
- Lebanon
- Iraq
|
Click here to learn
more about our Arabic Coding Scheme.