NATURAL LANGUAGE PROCESSING IN PYTHON

Learn to build intelligent text-processing systems using Python, extract meaning from language, and automate tasks effectively.

Course 6

out of 8 of CSO pathway

3 days

of course duration

2 modes

of physical or online

RM5,000/pax

Day 1
Fundamental: Memahami dunia NLP & aplikasi realiti.
Python Pro: Sintaks, struktur data & Jupyter Lab.
Preprocessing: Teknik pembersihan teks & tokenisasi.
Hands-on: Set-up persekitaran & bersih data teks.

Day 2
Tagging: POS Tagging & NER menggunakan spaCy.
Sentiment: Memahami emosi dalam penulisan.
Machine Learning: Bina klasifikasi teks (scikit-learn).
Hands-on: Bina model sentimen menggunakan Logistic Regression.

Day 3
Deep Dive: Rahsia di sebalik model GPT & Transformers.
Hugging Face: Guna model sedia ada untuk tugas teks.
Etika: Memahami bias & etika dalam dunia AI.
Hands-on: Automasi sorting emel & pembentangan Capstone.

Hardware Requirements

Minimum
  •  Laptop/PC 64-bit
  •  CPU: Intel i3/ i5 / AMD Ryzen 5 (atau setara), 4 cores
  •  RAM: 8 GB (recommended at least 16 GB)
  •  Storage: 10–20 GB ruang kosong (untuk Python env, dataset, model)
  •  Internet: stabil (untuk download library/dataset), Recommended: 10 Mbps+
  •  Browser: Chrome / Edge / Firefox (latest)

Recommended
  •  CPU: 6–8 cores (i5/i7/Ryzen 7)
  •  RAM: 16–32 GB
  •  Storage: SSD dengan 30–50 GB ruang kosong
  •  GPU Optional (NVIDIA GPU membantu untuk Transformers, tetapi tidak wajib)
  •  (Jika GPU ada) NVIDIA GPU dengan CUDA (8GB VRAM ke atas)
  •  Internet: 10 Mbps+ stabil (untuk download dataset & model)

Operating System:
  •  Windows 10/11 (64-bit) / macOS / Linux (Ubuntu recommended)

Python: 3.10 atau 3.11 (stable untuk kebanyakan library NLP)
 
Package manager / environment:
  •  Miniconda/Anaconda (recommended) atau venv + pip
 
Code editor:
  •  VS Code (recommended) atau PyCharm

Notebook environment:
  •  Jupyter Notebook / JupyterLab (recommended untuk hands-on)

Browser
  •  Google Chrome (latest)

Git (recommended):
  •  Git + GitHub account (untuk version control & submit assignment)

NLP Libraries (akan install masa kelas)
  •  Core: numpy, pandas, scikit-learn
  •  Text processing: nltk, spacy, regex
  •  Visualization: matplotlib
  •  Deep learning (optional ikut modul): pytorch atau tensorflow
  •  Transformers (optional, kalau masuk LLM): transformers, datasets

Transformers Stack
  •  Transformers, datasets, tokenizers, torch (PyTorch)

Data/Model Downloads (note untuk peserta)
  •  Ada model/dataset yang perlu download (contoh spaCy models / NLTK corpora)

Optional Tools
  •  Git, Zoom/Google Meet (untuk sesi online/capstone)

  •  Hands-on training with guided exercises and real datasets
  •  Ready-to-use Jupyter notebooks for all modules (preprocessing, spaCy, ML, Transformers)
  •  A working sentiment analysis model (Logistic Regression) you can reuse for projects
  •  Practical spaCy workflows: POS Tagging and Named Entity Recognition (NER)
  •  Introduction and practical use of pre-trained Transformer models (Hugging Face)
  •  A capstone mini-project and final presentation deck
  •  Certificate of Completion (subject to attendance)
  •  Post-training resource pack: datasets, cheat sheets, and reference links

1.  Yuran Kursus: RM5,000.00 / peserta

2.  Termasuk (Included):

   2.1 Latihan bersemuka 3 hari (trainer-led, hands-on)
   2.2 Bahan latihan: slaid, Jupyter notebooks, template kod, dataset latihan
   2.3 Bimbingan praktikal sepanjang sesi (setup & troubleshooting)
   2.4 Fail mini projek (portfolio-ready)
   2.5 Sijil Penyertaan/Completion
   2.6 Makan tengah hari + tea break (untuk kelas fizikal, akan disediakan oleh penganjur)

3.  Tidak Termasuk (Excluded):

   3.1 Laptop/peranti peserta (boleh berbincang dengan penganjur)
   3.2 Kos perjalanan & penginapan peserta
   3.3 Internet peribadi (jika diperlukan)

1.  NLP Practitioner (Beginner → Job-ready basic)
Boleh buat end-to-end text pipeline: clean → tokenize → vectorize → train → evaluate → predict.

2.  Boleh bina model text classification sendiri
Contoh: sentiment analysis, kategori aduan/feedback, intent/FAQ routing.

3.  Boleh evaluate & improve model dengan betul
Faham confusion matrix, precision, recall, F1, dan buat error analysis

4.  Boleh hasilkan output portfolio
Ada 2–3 mini projects + notebooks + code templates yang kemas untuk tunjuk pada majikan/klien.

5.  Boleh translate use-case kerja kepada solution
Contoh: automasi semakan teks, tag isu, ringkasan ringkas/keyword, dan carian mudah (ikut modul).

Dr. Eravan Bin Serri
Certified Trainer and
Certified Data Analytics Specialist (Willey/Edge)